SlideShare a Scribd company logo
Detecting	
  Multiple	
  Aliases	
  in	
  
Social	
  Media	
  
Amendra	
  Shrestha,	
  Lisa	
  Kaati,	
  Fredrik	
  Johansson	
  
26th	
  August	
  
Overview	
  
•  Introduction	
  
•  Reasons	
  for	
  multiple	
  aliases	
  
•  Techniques	
  for	
  detecting	
  aliases	
  
•  Dataset	
  
•  Experiment	
  and	
  Results	
  
•  Conclusion	
  and	
  Future	
  work	
  
Motivation	
  
1.	
  A.	
  Y.	
  Zelin	
  and	
  R.	
  B.	
  Fellow,	
  “The	
  state	
  of	
  global	
  jihad	
  online,”	
  New	
  America	
  Foundation,	
  2013.	
  	
  
2.	
  J.	
  Brynielsson,	
  A.	
  Horndahl,	
  F.	
  Johansson,	
  L.	
  Kaati,	
  C.	
  Mårtenson,	
  and	
  P.	
  Svenson,	
  “Harvesting	
  and	
  analysis	
  of	
  weak	
  signals	
  for	
  detecting	
  lone	
  wolf	
  terrorists,”	
  
Security	
  Informatics,	
  2013,	
  2:11.	
  
3.	
  http://www.businessinsider.com/facebook-­‐fake-­‐likes-­‐and-­‐accounts-­‐2012-­‐12	
  
	
  
[1]	
  
[3]	
  
[2]	
  
terrorists	
  make	
  extensive	
  use	
  of	
  social	
  media	
  /	
  discussion	
  
Problems	
  
•  changing	
  IP	
  address	
  and	
  URLs	
  frequently	
  
•  use	
  of	
  anonymization	
  techniques	
  like	
  Onion	
  Routing	
  and	
  Crowds	
  	
  
Our	
  Objective	
  
• 	
  Develop	
  methods	
  for	
  detecting	
  users	
  with	
  
multiple	
  aliases	
  
Use	
  of	
  multiple	
  aliases	
  
 Bizhant	
  Pokheral	
  
Use	
  of	
  multiple	
  aliases	
  
 Bizhant	
  Pokheral	
  
Use	
  of	
  multiple	
  aliases	
  
Cases	
  for	
  multiple	
  aliases	
  
•  Case	
  I	
  :	
  Alter	
  Ego	
  Aliases	
  
•  concealed	
  case	
  
•  Case	
  II	
  :	
  Multiple	
  Aliases	
  
•  non-­‐concealed	
  case	
  
•  Case	
  –	
  I	
  :	
  Alter	
  ego	
  aliases	
  
•  banned	
  by	
  administrator	
  
•  lost	
  trust	
  of	
  the	
  group	
  
•  developed	
  bad	
  personal	
  relationships	
  
•  to	
  support	
  his	
  arguments	
  
•  privacy	
  reasons	
  
	
  	
  
•  Case	
  –	
  II	
  :	
  Multiple	
  aliases	
  
•  banned	
  by	
  administrator	
  
•  banned	
  for	
  inactivity	
  
•  forgotten	
  password	
  
•  alias	
  name	
  is	
  already	
  used	
  
Possible	
  reasons	
  for	
  multiple	
  aliases	
  
Assumptions	
  
•  Case	
  I	
  :	
  Alter	
  ego	
  aliases	
  
•  doesn’t	
  have	
  same	
  friend	
  network	
  
•  write	
  in	
  at	
  least	
  one	
  common	
  thread	
  
•  no	
  name	
  equality	
  
•  similar	
  time	
  profile	
  
•  similarity	
  in	
  writing	
  style	
  
	
  
•  Case	
  II:	
  Multiple	
  aliases	
  
•  has	
  similar	
  friend	
  network	
  
•  doesn’t	
  write	
  in	
  same	
  thread	
  
•  equality	
  in	
  name	
  
•  similar	
  time	
  profile	
  
•  similarity	
  in	
  writing	
  style	
  
Techniques	
  for	
  detecting	
  aliases	
  
•  String-­‐based	
  matching	
  
•  Time	
  profile-­‐based	
  matching	
  
•  Stylometric	
  matching	
  
•  Social	
  network-­‐based	
  matching	
  
String	
  based	
  matching	
  
•  Based	
  on	
  aliases	
  name	
  
•  For	
  multiple	
  aliases	
  case	
  
•  Edit	
  distance	
  measures	
  
•  implemented	
  Jaro-­‐Winkler	
  distance	
  [1]	
  
1.	
  W.	
  E.	
  Winkler,	
  “String	
  comparator	
  metrics	
  and	
  enhanced	
  decision	
  rules	
  in	
  the	
  Fellegi-­‐Sunter	
  model	
  of	
  record	
  linkage,”	
  in	
  Proceedings	
  of	
  the	
  Section	
  on	
  Survey	
  Research	
  Methods,	
  1990,	
  pp.	
  354–359.	
  
Time	
  profile-­‐based	
  matching	
  
•  Post	
  created	
  time	
  
•  Time	
  profiles	
  based	
  on	
  relative	
  distribution	
  of	
  the	
  time	
  of	
  day	
  
•  Times	
  of	
  post:	
  <7:01,	
  7:25,	
  7:29,	
  7:40,	
  8:05,	
  8:55,	
  9:27,	
  10:17,	
  10:43,	
  13:11,	
  14:19,	
  14:59>	
  
•  Frequency	
  count:	
  <0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  4,	
  2,	
  1,	
  2,	
  0,	
  0,	
  1,	
  2,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0>	
  
•  Normalized	
  feature	
  vector<	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0.33,	
  0.16,	
  0.083,	
  0.16,	
  0,	
  0,	
  0.083,	
  0.16,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0,	
  0	
  >	
  
•  Calculate	
  Euclidean	
  distance	
  between	
  vectors	
  
Fig:	
  Time	
  profile	
  distribution	
  
Stylometric	
  matching	
  
•  Everyone	
  has	
  unique	
  writing	
  style	
  
•  Statistical	
  analysis	
  of	
  writing	
  style	
  
•  “writeprint”	
  
•  Calculate	
  cosine	
  of	
  angle	
  between	
  feature	
  vectors	
  
where	
  ​ 𝑝↓𝑖 	
  and	
  ​ 𝑞↓𝑖 	
  are	
  feature	
  vector	
  of	
  aliases	
  p	
  
and	
  q	
  respectively	
  
1.	
  A.	
  Narayanan,	
  H.	
  Paskov,	
  N.	
  Gong,	
  J.	
  Bethencourt,	
  E.	
  Stefanov,	
  E.	
  Shin,	
  and	
  D.	
  Song,	
  “On	
  the	
  feasibility	
  of	
  internet-­‐scale	
  author	
  identification,”	
  in	
  2012	
  IEEE	
  Symposium	
  on	
  Security	
  and	
  Privacy	
  (SP),	
  may	
  2012,	
  pp.	
  300	
  –314.	
  
1	
  
Social	
  network-­‐based	
  matching	
  
•  	
  Friend	
  Equality	
  
•  friend	
  network	
  
•  number	
  of	
  common	
  friends	
  
	
  
•  Thread	
  Equality	
  (Discussion	
  Boards)	
  
•  thread	
  network	
  
•  communication	
  patterns	
  
	
  
•  Jaccard	
  similarity	
  coefficient	
  	
  
	
  
	
  
aliases belong to the same user or not. In general, it is likely
that both aliases will make postings in the same thread if they
are alter egos, since the reason for creating an alter ego or
sockpuppet often is to support one’s own arguments.
No matter if the constructed social network is based on
friend-, thread- or topic information, we use vertex similarity
to calculate how similar two aliases are in terms of their social
network. The vertex similarity can be calculated as a function
of the number of neighbors in common for two aliases. If the
total number of neighbors should not impact the results too
much, a normalization process in which the node degrees are
taken into account is needed. Let p be the neighborhood of
vertex (alias) p in the network and q be the neighborhood
of vertex (alias) q. Now, the number of common neighbors
is calculated as | p  q|. The normalization can be done
in various ways (such as with dice or cosine similarity), but
in our implementation we make use of the Jaccard similarity
coefficient J(p, q), where:
J(p, q) =
| p  q|
| p [ q|.
(3)
In Figure 2 we illustrate the ego networks of aliases A and
C, where they have two neighbors in common (E and F).
Fig:	
  Friend	
  Network	
  
Matching	
  of	
  aliases	
  
•  Multiple	
  aliases	
  
•  all	
  above	
  techniques	
  
•  Alter	
  ego	
  
•  all	
  except	
  string-­‐based	
  technique	
  
•  Combination	
  of	
  techniques	
  
•  depending	
  upon	
  size	
  of	
  dataset	
  
•  all	
  at	
  once	
  
•  one	
  at	
  a	
  time	
  
•  Average	
  of	
  the	
  results	
  of	
  the	
  matching	
  techniques	
  
	
  
Dataset	
  
•  Irish	
  discussion	
  forum	
  boards.ie	
  data	
  
•  SIOC	
  format	
  	
  
•  Available	
  data	
  
•  10	
  years	
  data	
  
•  50	
  gigabytes	
  of	
  disk	
  space	
  
•  9	
  million	
  documents	
  
•  Used	
  data	
  
•  2008	
  year	
  data	
  
•  995	
  megabytes	
  in	
  size	
  
•  forums,	
  threads,	
  posts,	
  users	
  and	
  FOAF	
  documents	
  
•  more	
  than	
  1200	
  users	
  (posted	
  more	
  than	
  60	
  messages)	
  
•  220K	
  posts	
  
Experiment	
  
Experiment	
  Result	
  
User	
  1	
   User	
  2	
   Stylo	
  (Rank)	
   Time	
  (Rank)	
   Fusion	
  
1_A	
   1_B	
   1	
   1	
   1	
  
3_B	
   2	
   2	
   2	
  
2_B	
   3	
   3	
   3	
  
.	
   .	
   .	
   .	
  
.	
   .	
   .	
   .	
  
4_B	
   .	
   .	
   .	
  
N_B	
   N	
   N	
   N	
  
1	
  
1_A	
   1_B	
  
2	
   3	
   4	
   N	
  
1_A	
   1_B	
   2_B	
   3_B	
   4_B	
   N_B	
  
2_A	
   2_B	
  
Result	
  
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
50	
  
100	
  
150	
  
200	
  
250	
  
300	
  
350	
  
400	
  
450	
  
500	
  
550	
  
600	
  
650	
  
700	
  
750	
  
800	
  
850	
  
900	
  
950	
  
1000	
  
ACCURACY	
  
NUMBER	
  OF	
  USERS	
  
TOP-­‐3	
  
Time+Stylometry	
   Time	
   Stylometry	
  
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
50	
  
100	
  
150	
  
200	
  
250	
  
300	
  
350	
  
400	
  
450	
  
500	
  
550	
  
600	
  
650	
  
700	
  
750	
  
800	
  
850	
  
900	
  
950	
  
1000	
  
ACCURACY	
  
NUMBER	
  OF	
  USERS	
  
TOP-­‐1	
  
Time+Stylometry	
   Time	
   Stylometry	
  
Conclusion	
  
•  Presented	
  4	
  different	
  types	
  of	
  techniques	
  
•  Implemented	
  matching	
  techniques	
  
•  Experiments	
  using	
  time	
  and	
  stylometric	
  
•  Time	
  gives	
  better	
  results	
  than	
  stylometric	
  
•  Combining	
  the	
  results	
  of	
  each	
  matching	
  technique	
  gives	
  better	
  results	
  
Future	
  Work	
  
•  This	
  is	
  just	
  the	
  beginning	
  
•  Maximize	
  test	
  result	
  percentage	
  
•  Fusion	
  of	
  techniques	
  
•  Test	
  on	
  big	
  dataset	
  
	
  
Questions	
  

More Related Content

What's hot

02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
dnac
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
SkyBits Technologies Pvt. Ltd.
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
Duke Network Analysis Center
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
maria.grineva
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
joinson
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
Mark Cieliebak
 
11 Network Experiments and Interventions
11 Network Experiments and Interventions11 Network Experiments and Interventions
11 Network Experiments and Interventions
dnac
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
Jinho Choi
 
The Duet model
The Duet modelThe Duet model
The Duet model
Bhaskar Mitra
 
Insights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter contentInsights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter content
Stephen Dann
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
acijjournal
 
Text categorization
Text categorizationText categorization
Text categorization
Shubham Pahune
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
Duke Network Analysis Center
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
LeMeniz Infotech
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
dermotte
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
Waqas Tariq
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Folders
feiwin
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
Traian Rebedea
 
Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYC
Gilad Lotan
 

What's hot (20)

02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
11 Network Experiments and Interventions
11 Network Experiments and Interventions11 Network Experiments and Interventions
11 Network Experiments and Interventions
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Insights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter contentInsights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter content
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Text categorization
Text categorizationText categorization
Text categorization
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Folders
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYC
 

Viewers also liked

Corpoarte brochure version2
Corpoarte brochure version2Corpoarte brochure version2
Corpoarte brochure version2
Skillhippo Corporate
 
History of genre
History of genreHistory of genre
History of genre
WiktoriaPPaetz
 
Activity Profiles in Online Social Media
Activity Profiles in Online Social MediaActivity Profiles in Online Social Media
Activity Profiles in Online Social Media
Amendra Shrestha
 
物聯網入門探討
物聯網入門探討物聯網入門探討
物聯網入門探討
FEG
 
DSSC, Designing LED system for street lights IDM8
DSSC, Designing LED system for street lights IDM8DSSC, Designing LED system for street lights IDM8
DSSC, Designing LED system for street lights IDM8
Qatar University- Young Scientists Center (Al-Bairaq)
 
Mec nica dos solos i - ufba
Mec nica dos solos i - ufbaMec nica dos solos i - ufba
Mec nica dos solos i - ufba
João Marcos Barros
 
Sosyal medyada marka yönetimi
Sosyal medyada marka yönetimiSosyal medyada marka yönetimi
Sosyal medyada marka yönetimi
Esra Topal
 
Интеллектуально-деловой клуб. Итоги 2015 года
Интеллектуально-деловой клуб. Итоги 2015 годаИнтеллектуально-деловой клуб. Итоги 2015 года
Интеллектуально-деловой клуб. Итоги 2015 года
IDK-club
 
Assessing research impact mic 1 Sep 2015
Assessing research impact mic 1 Sep 2015Assessing research impact mic 1 Sep 2015
Assessing research impact mic 1 Sep 2015
Glucksman Library, University of Limerick
 
QG_SayOnPay
QG_SayOnPayQG_SayOnPay
QG_SayOnPay
David Larcker
 
Da pitch by matt & aidan
Da pitch by matt & aidanDa pitch by matt & aidan
Da pitch by matt & aidan
aidanandmatt
 
cl54_GovernancePains
cl54_GovernancePainscl54_GovernancePains
cl54_GovernancePains
David Larcker
 
Поняття про мультимедіа
Поняття про мультимедіаПоняття про мультимедіа
Поняття про мультимедіа
TheFac
 

Viewers also liked (13)

Corpoarte brochure version2
Corpoarte brochure version2Corpoarte brochure version2
Corpoarte brochure version2
 
History of genre
History of genreHistory of genre
History of genre
 
Activity Profiles in Online Social Media
Activity Profiles in Online Social MediaActivity Profiles in Online Social Media
Activity Profiles in Online Social Media
 
物聯網入門探討
物聯網入門探討物聯網入門探討
物聯網入門探討
 
DSSC, Designing LED system for street lights IDM8
DSSC, Designing LED system for street lights IDM8DSSC, Designing LED system for street lights IDM8
DSSC, Designing LED system for street lights IDM8
 
Mec nica dos solos i - ufba
Mec nica dos solos i - ufbaMec nica dos solos i - ufba
Mec nica dos solos i - ufba
 
Sosyal medyada marka yönetimi
Sosyal medyada marka yönetimiSosyal medyada marka yönetimi
Sosyal medyada marka yönetimi
 
Интеллектуально-деловой клуб. Итоги 2015 года
Интеллектуально-деловой клуб. Итоги 2015 годаИнтеллектуально-деловой клуб. Итоги 2015 года
Интеллектуально-деловой клуб. Итоги 2015 года
 
Assessing research impact mic 1 Sep 2015
Assessing research impact mic 1 Sep 2015Assessing research impact mic 1 Sep 2015
Assessing research impact mic 1 Sep 2015
 
QG_SayOnPay
QG_SayOnPayQG_SayOnPay
QG_SayOnPay
 
Da pitch by matt & aidan
Da pitch by matt & aidanDa pitch by matt & aidan
Da pitch by matt & aidan
 
cl54_GovernancePains
cl54_GovernancePainscl54_GovernancePains
cl54_GovernancePains
 
Поняття про мультимедіа
Поняття про мультимедіаПоняття про мультимедіа
Поняття про мультимедіа
 

Similar to Detecting Multiple Aliases in Social Media

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
Justin Sybrandt, Ph.D.
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
Charalampos Chelmis
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine Learning
Amendra Shrestha
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AI
VijayAECE1
 
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
Peter Brusilovsky
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
nishant24894
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
ssuser4b1f48
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Denis Parra Santander
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networks
pjing2
 
A recommender system for social learning platforms
A recommender system for social learning platformsA recommender system for social learning platforms
A recommender system for social learning platforms
Soudé Fazeli
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
Fattane Zarrinkalam
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
Craig Knoblock
 
On the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema MatchingOn the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema Matching
Joe Raad
 
Writing a scientific manuscript
Writing a scientific manuscriptWriting a scientific manuscript
Writing a scientific manuscript
Martin McMorrow
 
Visualizing communication at scad school of design
Visualizing communication at scad school of designVisualizing communication at scad school of design
Visualizing communication at scad school of design
SAAD ALZAROONI, CM
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
Lynn Connaway
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
OCLC
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
Stuart Wrigley
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
Simon Knight
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Xiaohan Zeng
 

Similar to Detecting Multiple Aliases in Social Media (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine Learning
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AI
 
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networks
 
A recommender system for social learning platforms
A recommender system for social learning platformsA recommender system for social learning platforms
A recommender system for social learning platforms
 
Slides ecir2016
Slides ecir2016Slides ecir2016
Slides ecir2016
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 
On the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema MatchingOn the Impact of sameAs on Schema Matching
On the Impact of sameAs on Schema Matching
 
Writing a scientific manuscript
Writing a scientific manuscriptWriting a scientific manuscript
Writing a scientific manuscript
 
Visualizing communication at scad school of design
Visualizing communication at scad school of designVisualizing communication at scad school of design
Visualizing communication at scad school of design
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 

Recently uploaded

一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 

Recently uploaded (20)

一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 

Detecting Multiple Aliases in Social Media

  • 1. Detecting  Multiple  Aliases  in   Social  Media   Amendra  Shrestha,  Lisa  Kaati,  Fredrik  Johansson   26th  August  
  • 2. Overview   •  Introduction   •  Reasons  for  multiple  aliases   •  Techniques  for  detecting  aliases   •  Dataset   •  Experiment  and  Results   •  Conclusion  and  Future  work  
  • 3. Motivation   1.  A.  Y.  Zelin  and  R.  B.  Fellow,  “The  state  of  global  jihad  online,”  New  America  Foundation,  2013.     2.  J.  Brynielsson,  A.  Horndahl,  F.  Johansson,  L.  Kaati,  C.  Mårtenson,  and  P.  Svenson,  “Harvesting  and  analysis  of  weak  signals  for  detecting  lone  wolf  terrorists,”   Security  Informatics,  2013,  2:11.   3.  http://www.businessinsider.com/facebook-­‐fake-­‐likes-­‐and-­‐accounts-­‐2012-­‐12     [1]   [3]   [2]   terrorists  make  extensive  use  of  social  media  /  discussion  
  • 4. Problems   •  changing  IP  address  and  URLs  frequently   •  use  of  anonymization  techniques  like  Onion  Routing  and  Crowds    
  • 5. Our  Objective   •   Develop  methods  for  detecting  users  with   multiple  aliases  
  • 6. Use  of  multiple  aliases  
  • 7.  Bizhant  Pokheral   Use  of  multiple  aliases  
  • 8.  Bizhant  Pokheral   Use  of  multiple  aliases  
  • 9. Cases  for  multiple  aliases   •  Case  I  :  Alter  Ego  Aliases   •  concealed  case   •  Case  II  :  Multiple  Aliases   •  non-­‐concealed  case  
  • 10. •  Case  –  I  :  Alter  ego  aliases   •  banned  by  administrator   •  lost  trust  of  the  group   •  developed  bad  personal  relationships   •  to  support  his  arguments   •  privacy  reasons       •  Case  –  II  :  Multiple  aliases   •  banned  by  administrator   •  banned  for  inactivity   •  forgotten  password   •  alias  name  is  already  used   Possible  reasons  for  multiple  aliases  
  • 11. Assumptions   •  Case  I  :  Alter  ego  aliases   •  doesn’t  have  same  friend  network   •  write  in  at  least  one  common  thread   •  no  name  equality   •  similar  time  profile   •  similarity  in  writing  style     •  Case  II:  Multiple  aliases   •  has  similar  friend  network   •  doesn’t  write  in  same  thread   •  equality  in  name   •  similar  time  profile   •  similarity  in  writing  style  
  • 12. Techniques  for  detecting  aliases   •  String-­‐based  matching   •  Time  profile-­‐based  matching   •  Stylometric  matching   •  Social  network-­‐based  matching  
  • 13. String  based  matching   •  Based  on  aliases  name   •  For  multiple  aliases  case   •  Edit  distance  measures   •  implemented  Jaro-­‐Winkler  distance  [1]   1.  W.  E.  Winkler,  “String  comparator  metrics  and  enhanced  decision  rules  in  the  Fellegi-­‐Sunter  model  of  record  linkage,”  in  Proceedings  of  the  Section  on  Survey  Research  Methods,  1990,  pp.  354–359.  
  • 14. Time  profile-­‐based  matching   •  Post  created  time   •  Time  profiles  based  on  relative  distribution  of  the  time  of  day   •  Times  of  post:  <7:01,  7:25,  7:29,  7:40,  8:05,  8:55,  9:27,  10:17,  10:43,  13:11,  14:19,  14:59>   •  Frequency  count:  <0,  0,  0,  0,  0,  0,  4,  2,  1,  2,  0,  0,  1,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0>   •  Normalized  feature  vector<  0,  0,  0,  0,  0,  0,  0.33,  0.16,  0.083,  0.16,  0,  0,  0.083,  0.16,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0  >   •  Calculate  Euclidean  distance  between  vectors   Fig:  Time  profile  distribution  
  • 15. Stylometric  matching   •  Everyone  has  unique  writing  style   •  Statistical  analysis  of  writing  style   •  “writeprint”   •  Calculate  cosine  of  angle  between  feature  vectors   where  ​ 𝑝↓𝑖   and  ​ 𝑞↓𝑖   are  feature  vector  of  aliases  p   and  q  respectively   1.  A.  Narayanan,  H.  Paskov,  N.  Gong,  J.  Bethencourt,  E.  Stefanov,  E.  Shin,  and  D.  Song,  “On  the  feasibility  of  internet-­‐scale  author  identification,”  in  2012  IEEE  Symposium  on  Security  and  Privacy  (SP),  may  2012,  pp.  300  –314.   1  
  • 16. Social  network-­‐based  matching   •   Friend  Equality   •  friend  network   •  number  of  common  friends     •  Thread  Equality  (Discussion  Boards)   •  thread  network   •  communication  patterns     •  Jaccard  similarity  coefficient         aliases belong to the same user or not. In general, it is likely that both aliases will make postings in the same thread if they are alter egos, since the reason for creating an alter ego or sockpuppet often is to support one’s own arguments. No matter if the constructed social network is based on friend-, thread- or topic information, we use vertex similarity to calculate how similar two aliases are in terms of their social network. The vertex similarity can be calculated as a function of the number of neighbors in common for two aliases. If the total number of neighbors should not impact the results too much, a normalization process in which the node degrees are taken into account is needed. Let p be the neighborhood of vertex (alias) p in the network and q be the neighborhood of vertex (alias) q. Now, the number of common neighbors is calculated as | p q|. The normalization can be done in various ways (such as with dice or cosine similarity), but in our implementation we make use of the Jaccard similarity coefficient J(p, q), where: J(p, q) = | p q| | p [ q|. (3) In Figure 2 we illustrate the ego networks of aliases A and C, where they have two neighbors in common (E and F). Fig:  Friend  Network  
  • 17. Matching  of  aliases   •  Multiple  aliases   •  all  above  techniques   •  Alter  ego   •  all  except  string-­‐based  technique   •  Combination  of  techniques   •  depending  upon  size  of  dataset   •  all  at  once   •  one  at  a  time   •  Average  of  the  results  of  the  matching  techniques    
  • 18. Dataset   •  Irish  discussion  forum  boards.ie  data   •  SIOC  format     •  Available  data   •  10  years  data   •  50  gigabytes  of  disk  space   •  9  million  documents   •  Used  data   •  2008  year  data   •  995  megabytes  in  size   •  forums,  threads,  posts,  users  and  FOAF  documents   •  more  than  1200  users  (posted  more  than  60  messages)   •  220K  posts  
  • 19. Experiment   Experiment  Result   User  1   User  2   Stylo  (Rank)   Time  (Rank)   Fusion   1_A   1_B   1   1   1   3_B   2   2   2   2_B   3   3   3   .   .   .   .   .   .   .   .   4_B   .   .   .   N_B   N   N   N   1   1_A   1_B   2   3   4   N   1_A   1_B   2_B   3_B   4_B   N_B   2_A   2_B  
  • 20. Result   0%   20%   40%   60%   80%   100%   50   100   150   200   250   300   350   400   450   500   550   600   650   700   750   800   850   900   950   1000   ACCURACY   NUMBER  OF  USERS   TOP-­‐3   Time+Stylometry   Time   Stylometry   0%   20%   40%   60%   80%   100%   50   100   150   200   250   300   350   400   450   500   550   600   650   700   750   800   850   900   950   1000   ACCURACY   NUMBER  OF  USERS   TOP-­‐1   Time+Stylometry   Time   Stylometry  
  • 21. Conclusion   •  Presented  4  different  types  of  techniques   •  Implemented  matching  techniques   •  Experiments  using  time  and  stylometric   •  Time  gives  better  results  than  stylometric   •  Combining  the  results  of  each  matching  technique  gives  better  results  
  • 22. Future  Work   •  This  is  just  the  beginning   •  Maximize  test  result  percentage   •  Fusion  of  techniques   •  Test  on  big  dataset