SlideShare a Scribd company logo
1 of 45
Download to read offline
Identification and Analysis of
Malicious Content on Facebook:
A Survey
Prateek	
  Dewan	
  (PhD1111)	
  
Comprehensive	
  Examination	
  
November	
  24,	
  2014
Committee	
  Members	
  
Dr.	
  Sambuddho	
  Chakravarty	
  (IIITD)	
  
Dr.	
  Anupam	
  Joshi	
  (IIITD	
  /	
  UMBC)	
  
Dr.	
  Anand	
  Kashyap	
  (Symantec	
  Research)	
  
Dr.	
  Ponnurangam	
  K.	
  (Advisor)
cerc.iiitd.ac.in
2
•What	
  is	
  Facebook?	
  
•Why	
  Facebook?	
  
•Malicious	
  content	
  on	
  Facebook	
  
•Identification	
  techniques	
  
•Research	
  gaps	
  
•Opportunities
Outline
cerc.iiitd.ac.in
3
•World’s	
  biggest	
  Online	
  Social	
  Network	
  (OSN)	
  
•Users	
  connect	
  by	
  mutual	
  consent	
  
•Used	
  to	
  
•keep	
  in	
  touch	
  with	
  friends	
  and	
  family	
  
•share	
  what	
  people	
  are	
  up	
  to	
  
•consume	
  information	
  about	
  real	
  world	
  events	
  1
What is Facebook?
1	
  http://www.pewresearch.org/fact-­‐tank/2014/09/24/how-­‐social-­‐media-­‐is-­‐reshaping-­‐news/
cerc.iiitd.ac.in
4
•Largest	
  online	
  social	
  network	
  in	
  the	
  world	
  
• 1.32	
  billion	
  monthly	
  active	
  users	
  
• 4.75	
  billion	
  posts	
  per	
  day	
  
• Over	
  300	
  petabytes	
  of	
  data	
  
•Spammers	
  exploit	
  context	
  of	
  event	
  to	
  lure	
  victims	
  into	
  
scams.	
  
•Facebook	
  spammers	
  make	
  $200	
  million	
  just	
  by	
  posting	
  
links.2
Why Facebook?
2	
  http://www.theguardian.com/technology/2013/aug/28/facebook-­‐spam-­‐202-­‐million-­‐italian-­‐research
cerc.iiitd.ac.in
5
Example (1 / 2)
cerc.iiitd.ac.in
6
Example (2 / 2)
cerc.iiitd.ac.in
7
Types of malicious content on
Facebook
Advertising Scams
Fake information
Malware
cerc.iiitd.ac.in
8
Focus
Online	
  Social	
  
Media
Malicious	
  
Content
Events
Vieweg	
  et	
  al.,	
  CHI	
  2010,	
  
Microblogging	
  during	
  two	
  
natural	
  hazard	
  events
Gupta	
  et	
  al.,	
  PSOSM	
  2012,	
  
Credibility	
  ranking	
  of	
  tweets	
  
during	
  high	
  impact	
  events
Hughes	
  et	
  al.,	
  ISCRAM	
  
2009,	
  Twitter	
  adoption	
  
and	
  use	
  in	
  mass	
  
convergence	
  and	
  
emergency	
  events
Mendoza	
  et	
  al.,	
  SOMA	
  
2010,	
  Twitter	
  under	
  crisis:	
  
Can	
  we	
  trust	
  what	
  we	
  RT?
Gao	
  et	
  al.,	
  IMC	
  2010,	
  
Detecting	
  and	
  characterizing	
  
social	
  spam	
  campaigns
Rahman	
  et	
  al.,	
  USENIX	
  
2012,	
  Efficient	
  and	
  
scalable	
  socware	
  detection	
  
in	
  online	
  social	
  networks
Thomas	
  et	
  al.,	
  IMC	
  2011,	
  
Suspended	
  accounts	
  in	
  
retrospect:	
  An	
  analysis	
  of	
  
Twitter	
  spam
Benevenuto	
  et	
  al.,	
  SIGIR	
  
2009,	
  Detecting	
  spammers	
  
and	
  content	
  promoters	
  in	
  
online	
  video	
  social	
  networks
cerc.iiitd.ac.in
9
Focus
Online	
  Social	
  
Media
Malicious	
  
Content
Events
Vieweg	
  et	
  al.,	
  CHI	
  2010,	
  
Microblogging	
  during	
  two	
  
natural	
  hazard	
  events
Gupta	
  et	
  al.,	
  PSOSM	
  2012,	
  
Credibility	
  ranking	
  of	
  tweets	
  
during	
  high	
  impact	
  events
Hughes	
  et	
  al.,	
  ISCRAM	
  
2009,	
  Twitter	
  adoption	
  
and	
  use	
  in	
  mass	
  
convergence	
  and	
  
emergency	
  events
Mendoza	
  et	
  al.,	
  SOMA	
  
2010,	
  Twitter	
  under	
  crisis:	
  
Can	
  we	
  trust	
  what	
  we	
  RT?
Gao	
  et	
  al.,	
  IMC	
  2010,	
  
Detecting	
  and	
  characterizing	
  
social	
  spam	
  campaigns
Rahman	
  et	
  al.,	
  USENIX	
  
2012,	
  Efficient	
  and	
  
scalable	
  socware	
  detection	
  
in	
  online	
  social	
  networks
Thomas	
  et	
  al.,	
  IMC	
  2011,	
  
Suspended	
  accounts	
  in	
  
retrospect:	
  An	
  analysis	
  of	
  
Twitter	
  spam
Benevenuto	
  et	
  al.,	
  SIGIR	
  
2009,	
  Detecting	
  spammers	
  
and	
  content	
  promoters	
  in	
  
online	
  video	
  social	
  networks
cerc.iiitd.ac.in
10
•Facebook	
  Immune	
  System,	
  Stein	
  et	
  al.,	
  2011	
  
•“News	
  Feed	
  FYI:	
  Cleaning	
  Up	
  News	
  Feed	
  Spam”,	
  2014	
  
(http://newsroom.fb.com/news/2014/04/news-­‐feed-­‐fyi-­‐cleaning-­‐up-­‐
news-­‐feed-­‐spam/)	
  
•Two	
  billion	
  dollar	
  lawsuit	
  against	
  fake	
  “Likes”	
  spam,	
  
2014
Facebook’s efforts to counter
malicious content
cerc.iiitd.ac.in
11
Research efforts to
combat malicious content
on Facebook
cerc.iiitd.ac.in
12
Overview
Data	
  
1.	
  Posts	
  containing	
  URLs	
  
2.	
  User	
  profiles
Identification	
  Techniques	
  
1.	
  Unsupervised	
  learning	
  
2.	
  Supervised	
  learning
Evaluation	
  metrics	
  
1.	
  Precision	
  /	
  Recall	
  
2.	
  True	
  positives	
  
3.	
  False	
  negatives	
  
4.	
  Purity	
  /	
  Inverse	
  purity	
  
5.	
  Manual	
  verification
Collection	
  Techniques	
  
1.	
  Snowball	
  sampling	
  
2.	
  Convenient	
  sampling	
  
3.	
  Honeypots
I
II
III
IV
cerc.iiitd.ac.in
13
Overview
Data	
  
1.	
  Posts	
  containing	
  URLs	
  
2.	
  User	
  profiles
Identification	
  Techniques	
  
1.	
  Unsupervised	
  learning	
  
2.	
  Supervised	
  learning
Collection	
  Techniques	
  
1.	
  Snowball	
  sampling	
  
2.	
  Convenient	
  sampling	
  
3.	
  Honeypots
Evaluation	
  metrics	
  
1.	
  Precision	
  /	
  Recall	
  
2.	
  True	
  positives	
  
3.	
  False	
  negatives	
  
4.	
  Purity	
  /	
  Inverse	
  purity	
  
5.	
  Manual	
  verification
I
II
III
IV
cerc.iiitd.ac.in
14
•Unsupervised	
  learning	
  
•Clustering	
  based	
  on	
  message	
  similarity	
  
•Markov	
  Clustering	
  (MCL)
Identification techniques
cerc.iiitd.ac.in
15
•Unsupervised	
  learning	
  
•Clustering	
  based	
  on	
  message	
  similarity	
  
•Markov	
  Clustering	
  (MCL)
Identification techniques
cerc.iiitd.ac.in
16
•Data	
  -­‐	
  187	
  million	
  wall	
  posts	
  
•2.08	
  million	
  wall	
  posts	
  containing	
  a	
  URL	
  
•Collection	
  technique	
  -­‐	
  Snowball	
  sampling	
  
•Methodology	
  
•Post	
  	
  <description,	
  URL>	
  
•Same	
  URL	
  /	
  similar	
  description	
  —>	
  cluster	
  
•“Distributed”	
  coverage	
  +	
  “Bursty”	
  nature	
  =	
  spam	
  cluster	
  
•distributed	
  nature:	
  >	
  5	
  users	
  in	
  a	
  cluster	
  
•bursty	
  nature:	
  median	
  time	
  between	
  posts	
  <	
  90	
  minutes
Clustering based on message
similarity
Detecting	
  and	
  Characterizing	
  Social	
  Spam	
  Campaigns,	
  Gao	
  et	
  al.,	
  IMC	
  2010
cerc.iiitd.ac.in
17
•Results	
  
•1.4	
  million	
  clusters	
  in	
  total	
  
•297	
  clusters	
  (212k	
  posts)	
  satisfied	
  distributed	
  and	
  bursty	
  
thresholds	
  
•Evaluation	
  
•93.9%	
  true	
  positives;	
  verified	
  manually	
  
•Highlights	
  
•Largest	
  dataset	
  in	
  literature	
  
•False	
  negative	
  estimation	
  missing
Clustering based on message
similarity
Detecting	
  and	
  Characterizing	
  Social	
  Spam	
  Campaigns,	
  Gao	
  et	
  al.,	
  IMC	
  2010
cerc.iiitd.ac.in
18
•Unsupervised	
  learning	
  
•Clustering	
  based	
  on	
  message	
  similarity	
  
•Markov	
  Clustering	
  (MCL)
Identification techniques
cerc.iiitd.ac.in
19
•Data	
  -­‐	
  320	
  user	
  profiles	
  
•165	
  spammers,	
  155	
  legitimate	
  
•Collection	
  technique	
  -­‐	
  Convenient	
  sampling	
  
•Methodology	
  
•Modelled	
  social	
  networks	
  as	
  weighted	
  graph	
  G	
  =	
  (V,	
  E,	
  W)	
  
•W(Eij)	
  =	
  |Fij
a|	
  +	
  |Pij|	
  +	
  |Uij|	
  
•Applied	
  Markov	
  Clustering	
  (MCL)
Markov clustering
common	
  
active	
  
friends
common	
  
page	
  
likes
fraction	
  of	
  
common	
  
URLs
An	
  MCL-­‐based	
  Approach	
  for	
  Spam	
  Profile	
  Detection	
  in	
  Online	
  Social	
  Networks,	
  
Ahmed	
  et	
  al.,	
  IEEE	
  TrustCom	
  2012
cerc.iiitd.ac.in
20
•Results	
  
•FP	
  =	
  0.88	
  (Harmonic	
  mean	
  of	
  purity	
  and	
  inverse	
  purity)	
  
•FB	
  =	
  0.79	
  (Harmonic	
  mean	
  of	
  precision	
  and	
  recall)	
  
•Highlights	
  
•Technique	
  uses	
  only	
  3	
  features,	
  yet	
  achieves	
  good	
  results	
  
•Small	
  dataset;	
  could	
  be	
  biased
Markov clustering
An	
  MCL-­‐based	
  Approach	
  for	
  Spam	
  Profile	
  Detection	
  in	
  Online	
  Social	
  Networks,	
  
Ahmed	
  et	
  al.,	
  IEEE	
  TrustCom	
  2012
cerc.iiitd.ac.in
21
Summary
Data	
  
1.	
  Posts	
  containing	
  URLs	
  
2.	
  User	
  profiles
Identification	
  Techniques	
  
1.	
  Unsupervised	
  learning	
  
2.	
  Supervised	
  learning
Evaluation	
  metrics	
  
1.	
  Precision	
  /	
  Recall	
  
2.	
  True	
  positives	
  
3.	
  False	
  negatives	
  
4.	
  Purity	
  /	
  Inverse	
  purity	
  
5.	
  Manual	
  verification
Collection	
  Techniques	
  
1.	
  Snowball	
  sampling	
  
2.	
  Convenient	
  sampling	
  
3.	
  Honeypots
I
II
III
IV
cerc.iiitd.ac.in
22
•Supervised	
  learning	
  
•Support	
  Vector	
  Machine	
  (SVM)	
  
•Decision	
  Trees	
  
•Random	
  Forests
Identification techniques
cerc.iiitd.ac.in
23
•Supervised	
  learning	
  
•Support	
  Vector	
  Machine	
  (SVM)	
  
•Decision	
  Trees	
  
•Random	
  Forests
Identification techniques
cerc.iiitd.ac.in
24
•Data	
  -­‐	
  7,500	
  wall	
  posts	
  containing	
  URLs	
  
•2,500	
  malicious,	
  5,000	
  legitimate	
  
•Collection	
  technique	
  -­‐	
  Convenient	
  sampling	
  
•Methodology
Support Vector Machine
Efficient	
  and	
  Scalable	
  Socware	
  Detection	
  in	
  Online	
  Social	
  Networks,	
  Rahman	
  et	
  al.,	
  
USENIX	
  Security,	
  2012
cerc.iiitd.ac.in
25
•Results	
  -­‐	
  60k	
  posts	
  marked	
  malicious	
  out	
  of	
  40M	
  posts	
  
•True	
  positive	
  rate:	
  97%	
  (manually	
  verified)	
  
•False	
  positive	
  rate:	
  0.005%	
  (manually	
  verified)	
  
•Highlights	
  
•Work	
  aimed	
  to	
  detect	
  “socware”	
  
•Real	
  world	
  deployment	
  -­‐	
  MyPageKeeper	
  
•Classifier	
  heavily	
  relies	
  on	
  bag-­‐of-­‐words	
  and	
  message	
  
similarity
Support Vector Machine
Efficient	
  and	
  Scalable	
  Socware	
  Detection	
  in	
  Online	
  Social	
  Networks,	
  Rahman	
  et	
  al.,	
  
USENIX	
  Security,	
  2012
cerc.iiitd.ac.in
26
•Supervised	
  learning	
  
•Support	
  Vector	
  Machine	
  (SVM)	
  
•Decision	
  Trees	
  
•Random	
  Forests
Identification techniques
cerc.iiitd.ac.in
27
•Data	
  -­‐	
  187	
  million	
  wall	
  posts	
  
•2.08	
  million	
  wall	
  posts	
  containing	
  a	
  URL	
  
•Collection	
  technique	
  -­‐	
  Snowball	
  sampling	
  
•Methodology
Decision trees
Towards	
  Online	
  Spam	
  Filtering	
  in	
  Social	
  Networks,	
  Gao	
  et	
  al.,	
  NDSS	
  2012
cerc.iiitd.ac.in
28
•Results	
  
•True	
  positive	
  rate:	
  80.9%	
  
•False	
  positive	
  rate:	
  0.19%	
  
•Throughput:	
  1,580	
  messages	
  /	
  second	
  
•Highlights	
  
•Model	
  stays	
  accurate	
  even	
  after	
  9	
  months	
  of	
  training	
  
•Model	
  tested	
  on	
  Twitter	
  data	
  as	
  well	
  
•All	
  “new”	
  instances	
  marked	
  as	
  legitimate;	
  system	
  cannot	
  
detect	
  campaigns	
  initially	
  outside	
  its	
  learning
Decision trees
Towards	
  Online	
  Spam	
  Filtering	
  in	
  Social	
  Networks,	
  Gao	
  et	
  al.,	
  NDSS	
  2012
cerc.iiitd.ac.in
29
•Supervised	
  learning	
  
•Support	
  Vector	
  Machine	
  (SVM)	
  
•Decision	
  Trees	
  
•Random	
  Forests
Identification techniques
cerc.iiitd.ac.in
30
•Data	
  -­‐	
  3,831	
  user	
  profiles	
  
•173	
  spammers,	
  827	
  legitimate	
  profiles	
  used	
  for	
  training	
  
•Collection	
  technique	
  -­‐	
  Honeypots	
  
•Methodology	
  
•Six	
  features	
  extracted	
  from	
  profiles	
  in	
  training	
  set	
  
•Applied	
  Random	
  forest	
  classifier	
  on	
  feature	
  vectors
Random forests
Detecting	
  Spammers	
  on	
  Social	
  Networks,	
  Stringhini	
  et	
  al.,	
  ACSAC	
  2010
cerc.iiitd.ac.in
31
•Results	
  -­‐	
  10	
  fold	
  cross	
  validation	
  on	
  training	
  set	
  
•False	
  positive	
  rate:	
  2%	
  
•False	
  negative	
  rate:	
  1%	
  
•130	
  more	
  profiles	
  marked	
  as	
  spam	
  from	
  790k	
  profiles	
  in	
  test	
  
set	
  (7	
  false	
  positives)	
  
•Highlights	
  
•Only	
  work	
  to	
  use	
  honeypot	
  approach	
  
•Approach	
  tested	
  on	
  Twitter	
  too	
  
•Features	
  used	
  not	
  available	
  publicly	
  (FF	
  ratio,	
  friend	
  choice…)
Random forests
Detecting	
  Spammers	
  on	
  Social	
  Networks,	
  Stringhini	
  et	
  al.,	
  ACSAC	
  2010
cerc.iiitd.ac.in
32
Summary
Data	
  
1.	
  Posts	
  containing	
  URLs	
  
2.	
  User	
  profiles
Identification	
  Techniques	
  
1.	
  Unsupervised	
  learning	
  
2.	
  Supervised	
  learning
Evaluation	
  metrics	
  
1.	
  Precision	
  /	
  Recall	
  
2.	
  True	
  positives	
  
3.	
  False	
  negatives	
  
4.	
  Purity	
  /	
  Inverse	
  purity	
  
5.	
  Manual	
  verification
Collection	
  Techniques	
  
1.	
  Snowball	
  sampling	
  
2.	
  Convenient	
  sampling	
  
3.	
  Honeypots
I
II
III
IV
cerc.iiitd.ac.in
33
In a nutshell
User	
  
detection
Post	
  
detection
Real	
  
time
Real	
  world	
  
deployment
Validation	
  
on	
  other	
  
OSNs
Profile	
  
features
Content	
  
features
Network	
  
features
Gao	
  et	
  al.,	
  
2010,	
  IMC
✔ ✔
Stringhini	
  et	
  
al.,	
  ACSAC	
  
2010
✔ ✔ ✔ ✔
Gao	
  et	
  al.,	
  
2012,	
  NDSS
✔ ✔ ✔ ✔ ✔
Rahman	
  et	
  
al.,	
  2012	
  
USENIX
✔ ✔ ✔ ✔
Ahmed	
  et	
  
al.,	
  2012,	
  
TrustCom
✔ ✔ ✔ ✔
cerc.iiitd.ac.in
34
•Facebook	
  
• Malicious	
  URL	
  detection	
  on	
  Facebook,	
  Guan	
  et	
  al.,	
  JWIS	
  2011	
  
• FRAppE:	
  Detecting	
  malicious	
  applications	
  on	
  Facebook,	
  Rahman	
  et	
  
al.,	
  CoNEXT	
  2012	
  
•Other	
  OSNs	
  
•Suspended	
  accounts	
  in	
  retrospect:	
  An	
  analysis	
  of	
  Twitter	
  spam,	
  Thomas	
  
et	
  al.,	
  IMC	
  2011	
  
•Detecting	
  spammers	
  and	
  content	
  promoters	
  in	
  online	
  video	
  social	
  
networks,	
  Benevenuto	
  et	
  al.,	
  SIGIR	
  2009	
  
•Uncovering	
  social	
  spammers:	
  social	
  honeypots	
  +	
  machine	
  learning,	
  Lee	
  et	
  
al.,	
  SIGIR	
  2010	
  
Other work on malicious content
detection
cerc.iiitd.ac.in
35
•Approximately	
  28%	
  users	
  share	
  their	
  data	
  with	
  an	
  
audience	
  wider	
  than	
  their	
  friends.	
  3	
  
•Public	
  data	
  is	
  not	
  rich	
  
•Network	
  features	
  not	
  accessible	
  publicly	
  
•Limited	
  user	
  profile	
  features	
  available	
  
•No	
  real	
  time	
  API	
  endpoint	
  to	
  fetch	
  data	
  in	
  real	
  time
Challenges in conducting research
on Facebook
3	
  http://www.consumerreports.org/cro/magazine/2012/06/facebook-­‐your-­‐privacy/index.htm
cerc.iiitd.ac.in
36
•Interaction	
  with	
  Christopher	
  Palow,	
  Engineering	
  Manager,	
  
Facebook	
  (fb.com/palow)	
  
•“We’re	
  not	
  really	
  fond	
  of	
  people	
  who	
  crawl	
  Facebook	
  for	
  research.”	
  
•“Facebook’s	
  filters	
  don’t	
  catch	
  all	
  the	
  bad	
  content.	
  If	
  something	
  goes	
  
undetected	
  in	
  real	
  time,	
  it	
  is	
  usually	
  detected	
  within	
  the	
  next	
  24	
  hours.	
  If	
  
not,	
  it’s	
  gone	
  forever.”	
  
•“Web	
  of	
  Trust	
  is	
  mostly	
  reputation	
  based	
  whereas	
  SURBL	
  is	
  spam	
  honey	
  
pots.	
  	
  We	
  actually	
  have	
  access	
  to	
  both	
  but	
  we	
  don't	
  keep	
  our	
  integrations	
  
well	
  updated.”	
  
•Sample	
  bash	
  script	
  to	
  look	
  for	
  spammers	
  
grep	
  -­‐P	
  -­‐i	
  -­‐o	
  '_wau.push(["small",	
  "S+",	
  "'	
  /tmp/bad_site.html	
  |	
  cut	
  -­‐d	
  ,	
  -­‐f	
  2	
  	
  |	
  cut	
  -­‐d	
  '"'	
  -­‐f	
  
2	
  |	
  awk	
  '{print	
  "http://whos.amung.us/stats/history/"	
  $1	
  "/"}'
Straight from Facebook
cerc.iiitd.ac.in
37
•Large	
  scale	
  studies	
  for	
  detecting	
  malicious	
  content	
  
•Ignore	
  posts	
  without	
  URLs	
  
•Work	
  on	
  detecting	
  malicious	
  posts	
  which	
  are	
  part	
  of	
  a	
  
campaign	
  
•Rely	
  partially	
  on	
  passive	
  features	
  (likes,	
  comments	
  etc.)	
  
which	
  take	
  time	
  to	
  build	
  up;	
  not	
  suitable	
  for	
  real	
  time	
  
detection	
  
•Don’t	
  rely	
  on	
  profile	
  features	
  
•Crawled	
  Facebook	
  -­‐	
  ethical	
  implications	
  
•Did	
  so	
  prior	
  to	
  2009	
  
Research gaps
cerc.iiitd.ac.in
38
•Existing	
  techniques	
  used	
  to	
  detect	
  malicious	
  content	
  
on	
  other	
  social	
  networks	
  cannot	
  be	
  directly	
  ported	
  to	
  
Facebook	
  
•Lack	
  of	
  publicly	
  available	
  information	
  
•Difference	
  in	
  network	
  dynamics	
  and	
  usage
Research gaps
cerc.iiitd.ac.in
39
•Efficient,	
  real	
  time,	
  self	
  learning	
  and	
  evolving	
  
techniques	
  to	
  detect	
  malicious	
  content	
  
•Study	
  content	
  without	
  URLs	
  
•Light	
  weight	
  client-­‐side	
  solutions	
  for	
  malicious	
  content	
  
identification
Opportunities
Thanks!	
  	
  
prateekd@iiitd.ac.in	
  
http://precog.iiitd.edu.in/people/prateek/	
  	
  
cerc.iiitd.ac.in
Backup slides
cerc.iiitd.ac.in
42
•Methodology	
  
•Collected	
  49,110	
  posts	
  on	
  top	
  20	
  Facebook	
  pages	
  in	
  July	
  
2010	
  (to	
  Feb.	
  2011).	
  
•5,637	
  malicious;	
  43,473	
  benign	
  
•Posts	
  made	
  by	
  users	
  on	
  page;	
  NOT	
  the	
  posts	
  made	
  by	
  page	
  itself.	
  
•Filtered	
  posts	
  containing	
  URLs,	
  looked	
  for	
  URLs	
  blacklisted	
  by	
  
PhishTank,	
  SURBL,	
  WOT,	
  Google	
  SafeBrowsing	
  etc.	
  
•Extracted	
  7	
  features	
  (4	
  URL,	
  3	
  message	
  containing	
  the	
  URL)	
  
•Used	
  Bayes	
  classification	
  model	
  to	
  classify	
  URLs	
  as	
  
malicious	
  /	
  benign
Bayes classification
Malicious	
  URL	
  Detection	
  on	
  Facebook,	
  Guan	
  et	
  al.,	
  JWIS	
  2011
cerc.iiitd.ac.in
43
•Results	
  
•Authors	
  report	
  an	
  accuracy	
  of	
  94.9%	
  (TPR:	
  95%,	
  FNR:	
  3.56%)	
  
•Validation	
  
•Technique	
  validated	
  using	
  top	
  20	
  Facebook	
  pages	
  in	
  Europe	
  
and	
  Asia	
  
•Achieved	
  approx.	
  90%	
  accuracy	
  and	
  TPR	
  in	
  both;	
  
performance	
  felt	
  due	
  to	
  skewed	
  malicious:benign	
  ratio	
  in	
  
datasets
Bayes classification
Malicious	
  URL	
  Detection	
  on	
  Facebook,	
  Guan	
  et	
  al.,	
  JWIS	
  2011
cerc.iiitd.ac.in
44
•FRAppE	
  (Facebook’s	
  Rigorous	
  Application	
  Evaluator)	
  
•Identified	
  6,273	
  malicious	
  apps	
  (from	
  MyPageKeeper)	
  and	
  
6,273	
  benign	
  apps	
  (from	
  Social	
  Bakers)	
  
•Characterised	
  app	
  behaviour;	
  extracted	
  on-­‐demand	
  and	
  
aggregation-­‐based	
  features	
  
•Observations	
  
•Malicious	
  apps	
  redirect	
  users	
  to	
  domains	
  with	
  poor	
  
reputation	
  
•97%	
  of	
  malicious	
  apps	
  request	
  for	
  only	
  “publish”	
  permissions	
  
Malicious apps
FRAppE:	
  Detecting	
  Malicious	
  Facebook	
  Applications,	
  Rahman	
  et	
  al.,	
  CoNEXT	
  2012
cerc.iiitd.ac.in
45
•Results	
  
•FRAppE	
  Lite	
  
•Extracts	
  on-­‐demand	
  features	
  -­‐	
  Accuracy:	
  99%,	
  FN	
  rate:	
  4.4%	
  
•FRAppE	
  
•Extracts	
  on-­‐demand	
  and	
  aggregation-­‐based	
  features	
  -­‐	
  Accuracy:	
  
99.5%,	
  FN	
  rate:	
  4.1%	
  
•Validation	
  
•98.5%	
  apps	
  validated	
  as	
  malicious	
  (apps	
  deleted,	
  app	
  name	
  
similarity,	
  typo	
  squatting,	
  posted	
  link	
  similarity	
  etc.)
Malicious apps
FRAppE:	
  Detecting	
  Malicious	
  Facebook	
  Applications,	
  Rahman	
  et	
  al.,	
  CoNEXT	
  2012

More Related Content

Viewers also liked

Fr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsFr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsPvrtechnologies Nellore
 
Fr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsFr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsCloudTechnologies
 
Detecting malicious facebook applications
Detecting malicious facebook applicationsDetecting malicious facebook applications
Detecting malicious facebook applicationsnexgentech15
 
Frappe ERPNext Open Day February 2014
Frappe ERPNext Open Day February 2014Frappe ERPNext Open Day February 2014
Frappe ERPNext Open Day February 2014rushabh_mehta
 
FRAppE Detecting Malicious Facebook Applications
FRAppE Detecting Malicious Facebook ApplicationsFRAppE Detecting Malicious Facebook Applications
FRAppE Detecting Malicious Facebook ApplicationsNagamalleswararao Tadikonda
 
Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...UltraUploader
 
Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Selman Bozkır
 
Facebook Attacks - an in-depth analysis
Facebook Attacks - an in-depth analysisFacebook Attacks - an in-depth analysis
Facebook Attacks - an in-depth analysisCyren, Inc
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generationtusharjadhav2611
 
Data Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchData Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchQiang Hao
 
ATPG Methods and Algorithms
ATPG Methods and AlgorithmsATPG Methods and Algorithms
ATPG Methods and AlgorithmsDeiptii Das
 

Viewers also liked (17)

Fr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsFr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applications
 
Fr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applicationsFr app e detecting malicious facebook applications
Fr app e detecting malicious facebook applications
 
Android security
Android security Android security
Android security
 
Detecting malicious facebook applications
Detecting malicious facebook applicationsDetecting malicious facebook applications
Detecting malicious facebook applications
 
Frappe ERPNext Open Day February 2014
Frappe ERPNext Open Day February 2014Frappe ERPNext Open Day February 2014
Frappe ERPNext Open Day February 2014
 
FRAppE Detecting Malicious Facebook Applications
FRAppE Detecting Malicious Facebook ApplicationsFRAppE Detecting Malicious Facebook Applications
FRAppE Detecting Malicious Facebook Applications
 
Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...
 
Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...Identification of User Patterns in Social Networks by Data Mining Techniques:...
Identification of User Patterns in Social Networks by Data Mining Techniques:...
 
Facebook Attacks - an in-depth analysis
Facebook Attacks - an in-depth analysisFacebook Attacks - an in-depth analysis
Facebook Attacks - an in-depth analysis
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generation
 
Data Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational ResearchData Mining and Text Mining in Educational Research
Data Mining and Text Mining in Educational Research
 
ATPG Methods and Algorithms
ATPG Methods and AlgorithmsATPG Methods and Algorithms
ATPG Methods and Algorithms
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
IEEE Presentation
IEEE PresentationIEEE Presentation
IEEE Presentation
 
IEEE Standards
IEEE StandardsIEEE Standards
IEEE Standards
 
Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots? Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots?
 
Growth Hacker le Social Media - Growth Hacking Paris 10
Growth Hacker le Social Media - Growth Hacking Paris 10Growth Hacker le Social Media - Growth Hacking Paris 10
Growth Hacker le Social Media - Growth Hacking Paris 10
 

Similar to Identification and Analysis of Malicious Content on Facebook: A Survey

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
Dynamic feature selection for spam detection (1).pptx
Dynamic feature selection for spam detection (1).pptxDynamic feature selection for spam detection (1).pptx
Dynamic feature selection for spam detection (1).pptxRivikaJain
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Data mining in security: Ja'far Alqatawna
Data mining in security: Ja'far AlqatawnaData mining in security: Ja'far Alqatawna
Data mining in security: Ja'far AlqatawnaMaribel García Arenas
 
Risks and Security of Internet and System
Risks and Security of Internet and SystemRisks and Security of Internet and System
Risks and Security of Internet and SystemParam Nanavati
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...ijaia
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPAnkita Jadhao
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPAnkita Jadhao
 
Integrated approach to detect spam in social media networks using hybrid feat...
Integrated approach to detect spam in social media networks using hybrid feat...Integrated approach to detect spam in social media networks using hybrid feat...
Integrated approach to detect spam in social media networks using hybrid feat...IJECEIAES
 
Terrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningTerrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningIRJET Journal
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaSymeon Papadopoulos
 
Knowledge SPA IBS Seminar Series
Knowledge SPA IBS Seminar Series Knowledge SPA IBS Seminar Series
Knowledge SPA IBS Seminar Series Norshidah Mohamed
 
Differential Privacy for Information Retrieval
Differential Privacy for Information RetrievalDifferential Privacy for Information Retrieval
Differential Privacy for Information RetrievalGrace Hui Yang
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
 

Similar to Identification and Analysis of Malicious Content on Facebook: A Survey (20)

A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
Dynamic feature selection for spam detection (1).pptx
Dynamic feature selection for spam detection (1).pptxDynamic feature selection for spam detection (1).pptx
Dynamic feature selection for spam detection (1).pptx
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Data mining in security: Ja'far Alqatawna
Data mining in security: Ja'far AlqatawnaData mining in security: Ja'far Alqatawna
Data mining in security: Ja'far Alqatawna
 
Risks and Security of Internet and System
Risks and Security of Internet and SystemRisks and Security of Internet and System
Risks and Security of Internet and System
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
 
Spammer taxonomy using scientific approach
Spammer taxonomy using scientific approachSpammer taxonomy using scientific approach
Spammer taxonomy using scientific approach
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
Integrated approach to detect spam in social media networks using hybrid feat...
Integrated approach to detect spam in social media networks using hybrid feat...Integrated approach to detect spam in social media networks using hybrid feat...
Integrated approach to detect spam in social media networks using hybrid feat...
 
Terrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningTerrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data Mining
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social Media
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Knowledge SPA IBS Seminar Series
Knowledge SPA IBS Seminar Series Knowledge SPA IBS Seminar Series
Knowledge SPA IBS Seminar Series
 
Differential Privacy for Information Retrieval
Differential Privacy for Information RetrievalDifferential Privacy for Information Retrieval
Differential Privacy for Information Retrieval
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
Youngen "Secure Remote Access to Scholarly Resources"
Youngen "Secure Remote Access to Scholarly Resources"Youngen "Secure Remote Access to Scholarly Resources"
Youngen "Secure Remote Access to Scholarly Resources"
 
204
204204
204
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 

More from Cybersecurity Education and Research Centre

Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Cybersecurity Education and Research Centre
 
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Cybersecurity Education and Research Centre
 
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...Cybersecurity Education and Research Centre
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Cybersecurity Education and Research Centre
 

More from Cybersecurity Education and Research Centre (17)

Automated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social NetworksAutomated Methods for Identity Resolution across Online Social Networks
Automated Methods for Identity Resolution across Online Social Networks
 
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
Novel Instruction Set Architecture Based Side Channels in popular SSL/TLS Imp...
 
Video Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical FlowVideo Inpainting detection using inconsistencies in optical Flow
Video Inpainting detection using inconsistencies in optical Flow
 
TASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet InfrastructureTASVEER : Tomography of India’s Internet Infrastructure
TASVEER : Tomography of India’s Internet Infrastructure
 
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
Data-Driven Assessment of Cyber Risk: Challenges in Assessing and Migrating C...
 
A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges A Strategy for Addressing Cyber Security Challenges
A Strategy for Addressing Cyber Security Challenges
 
Clotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and IncorrectClotho : Saving Programs from Malformed Strings and Incorrect
Clotho : Saving Programs from Malformed Strings and Incorrect
 
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
National Critical Information Infrastructure Protection Centre (NCIIPC): Role...
 
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�Clotho: Saving Programs from Malformed Strings and Incorrect String-handling�
Clotho: Saving Programs from Malformed Strings and Incorrect String-handling
 
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing EmailsAnalyzing Social and Stylometric Features to Identify Spear phishing Emails
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
 
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing PageEmerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
Emerging Phishing Trends and Effectiveness of the Anti-Phishing Landing Page
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
Twitter and Polls: What Do 140 Characters Say About India General Elections 2014
 
Web Application Security 101
Web Application Security 101Web Application Security 101
Web Application Security 101
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
 
The future of interaction & its security challenges
The future of interaction & its security challengesThe future of interaction & its security challenges
The future of interaction & its security challenges
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Identification and Analysis of Malicious Content on Facebook: A Survey

  • 1. Identification and Analysis of Malicious Content on Facebook: A Survey Prateek  Dewan  (PhD1111)   Comprehensive  Examination   November  24,  2014 Committee  Members   Dr.  Sambuddho  Chakravarty  (IIITD)   Dr.  Anupam  Joshi  (IIITD  /  UMBC)   Dr.  Anand  Kashyap  (Symantec  Research)   Dr.  Ponnurangam  K.  (Advisor)
  • 2. cerc.iiitd.ac.in 2 •What  is  Facebook?   •Why  Facebook?   •Malicious  content  on  Facebook   •Identification  techniques   •Research  gaps   •Opportunities Outline
  • 3. cerc.iiitd.ac.in 3 •World’s  biggest  Online  Social  Network  (OSN)   •Users  connect  by  mutual  consent   •Used  to   •keep  in  touch  with  friends  and  family   •share  what  people  are  up  to   •consume  information  about  real  world  events  1 What is Facebook? 1  http://www.pewresearch.org/fact-­‐tank/2014/09/24/how-­‐social-­‐media-­‐is-­‐reshaping-­‐news/
  • 4. cerc.iiitd.ac.in 4 •Largest  online  social  network  in  the  world   • 1.32  billion  monthly  active  users   • 4.75  billion  posts  per  day   • Over  300  petabytes  of  data   •Spammers  exploit  context  of  event  to  lure  victims  into   scams.   •Facebook  spammers  make  $200  million  just  by  posting   links.2 Why Facebook? 2  http://www.theguardian.com/technology/2013/aug/28/facebook-­‐spam-­‐202-­‐million-­‐italian-­‐research
  • 7. cerc.iiitd.ac.in 7 Types of malicious content on Facebook Advertising Scams Fake information Malware
  • 8. cerc.iiitd.ac.in 8 Focus Online  Social   Media Malicious   Content Events Vieweg  et  al.,  CHI  2010,   Microblogging  during  two   natural  hazard  events Gupta  et  al.,  PSOSM  2012,   Credibility  ranking  of  tweets   during  high  impact  events Hughes  et  al.,  ISCRAM   2009,  Twitter  adoption   and  use  in  mass   convergence  and   emergency  events Mendoza  et  al.,  SOMA   2010,  Twitter  under  crisis:   Can  we  trust  what  we  RT? Gao  et  al.,  IMC  2010,   Detecting  and  characterizing   social  spam  campaigns Rahman  et  al.,  USENIX   2012,  Efficient  and   scalable  socware  detection   in  online  social  networks Thomas  et  al.,  IMC  2011,   Suspended  accounts  in   retrospect:  An  analysis  of   Twitter  spam Benevenuto  et  al.,  SIGIR   2009,  Detecting  spammers   and  content  promoters  in   online  video  social  networks
  • 9. cerc.iiitd.ac.in 9 Focus Online  Social   Media Malicious   Content Events Vieweg  et  al.,  CHI  2010,   Microblogging  during  two   natural  hazard  events Gupta  et  al.,  PSOSM  2012,   Credibility  ranking  of  tweets   during  high  impact  events Hughes  et  al.,  ISCRAM   2009,  Twitter  adoption   and  use  in  mass   convergence  and   emergency  events Mendoza  et  al.,  SOMA   2010,  Twitter  under  crisis:   Can  we  trust  what  we  RT? Gao  et  al.,  IMC  2010,   Detecting  and  characterizing   social  spam  campaigns Rahman  et  al.,  USENIX   2012,  Efficient  and   scalable  socware  detection   in  online  social  networks Thomas  et  al.,  IMC  2011,   Suspended  accounts  in   retrospect:  An  analysis  of   Twitter  spam Benevenuto  et  al.,  SIGIR   2009,  Detecting  spammers   and  content  promoters  in   online  video  social  networks
  • 10. cerc.iiitd.ac.in 10 •Facebook  Immune  System,  Stein  et  al.,  2011   •“News  Feed  FYI:  Cleaning  Up  News  Feed  Spam”,  2014   (http://newsroom.fb.com/news/2014/04/news-­‐feed-­‐fyi-­‐cleaning-­‐up-­‐ news-­‐feed-­‐spam/)   •Two  billion  dollar  lawsuit  against  fake  “Likes”  spam,   2014 Facebook’s efforts to counter malicious content
  • 11. cerc.iiitd.ac.in 11 Research efforts to combat malicious content on Facebook
  • 12. cerc.iiitd.ac.in 12 Overview Data   1.  Posts  containing  URLs   2.  User  profiles Identification  Techniques   1.  Unsupervised  learning   2.  Supervised  learning Evaluation  metrics   1.  Precision  /  Recall   2.  True  positives   3.  False  negatives   4.  Purity  /  Inverse  purity   5.  Manual  verification Collection  Techniques   1.  Snowball  sampling   2.  Convenient  sampling   3.  Honeypots I II III IV
  • 13. cerc.iiitd.ac.in 13 Overview Data   1.  Posts  containing  URLs   2.  User  profiles Identification  Techniques   1.  Unsupervised  learning   2.  Supervised  learning Collection  Techniques   1.  Snowball  sampling   2.  Convenient  sampling   3.  Honeypots Evaluation  metrics   1.  Precision  /  Recall   2.  True  positives   3.  False  negatives   4.  Purity  /  Inverse  purity   5.  Manual  verification I II III IV
  • 14. cerc.iiitd.ac.in 14 •Unsupervised  learning   •Clustering  based  on  message  similarity   •Markov  Clustering  (MCL) Identification techniques
  • 15. cerc.iiitd.ac.in 15 •Unsupervised  learning   •Clustering  based  on  message  similarity   •Markov  Clustering  (MCL) Identification techniques
  • 16. cerc.iiitd.ac.in 16 •Data  -­‐  187  million  wall  posts   •2.08  million  wall  posts  containing  a  URL   •Collection  technique  -­‐  Snowball  sampling   •Methodology   •Post    <description,  URL>   •Same  URL  /  similar  description  —>  cluster   •“Distributed”  coverage  +  “Bursty”  nature  =  spam  cluster   •distributed  nature:  >  5  users  in  a  cluster   •bursty  nature:  median  time  between  posts  <  90  minutes Clustering based on message similarity Detecting  and  Characterizing  Social  Spam  Campaigns,  Gao  et  al.,  IMC  2010
  • 17. cerc.iiitd.ac.in 17 •Results   •1.4  million  clusters  in  total   •297  clusters  (212k  posts)  satisfied  distributed  and  bursty   thresholds   •Evaluation   •93.9%  true  positives;  verified  manually   •Highlights   •Largest  dataset  in  literature   •False  negative  estimation  missing Clustering based on message similarity Detecting  and  Characterizing  Social  Spam  Campaigns,  Gao  et  al.,  IMC  2010
  • 18. cerc.iiitd.ac.in 18 •Unsupervised  learning   •Clustering  based  on  message  similarity   •Markov  Clustering  (MCL) Identification techniques
  • 19. cerc.iiitd.ac.in 19 •Data  -­‐  320  user  profiles   •165  spammers,  155  legitimate   •Collection  technique  -­‐  Convenient  sampling   •Methodology   •Modelled  social  networks  as  weighted  graph  G  =  (V,  E,  W)   •W(Eij)  =  |Fij a|  +  |Pij|  +  |Uij|   •Applied  Markov  Clustering  (MCL) Markov clustering common   active   friends common   page   likes fraction  of   common   URLs An  MCL-­‐based  Approach  for  Spam  Profile  Detection  in  Online  Social  Networks,   Ahmed  et  al.,  IEEE  TrustCom  2012
  • 20. cerc.iiitd.ac.in 20 •Results   •FP  =  0.88  (Harmonic  mean  of  purity  and  inverse  purity)   •FB  =  0.79  (Harmonic  mean  of  precision  and  recall)   •Highlights   •Technique  uses  only  3  features,  yet  achieves  good  results   •Small  dataset;  could  be  biased Markov clustering An  MCL-­‐based  Approach  for  Spam  Profile  Detection  in  Online  Social  Networks,   Ahmed  et  al.,  IEEE  TrustCom  2012
  • 21. cerc.iiitd.ac.in 21 Summary Data   1.  Posts  containing  URLs   2.  User  profiles Identification  Techniques   1.  Unsupervised  learning   2.  Supervised  learning Evaluation  metrics   1.  Precision  /  Recall   2.  True  positives   3.  False  negatives   4.  Purity  /  Inverse  purity   5.  Manual  verification Collection  Techniques   1.  Snowball  sampling   2.  Convenient  sampling   3.  Honeypots I II III IV
  • 22. cerc.iiitd.ac.in 22 •Supervised  learning   •Support  Vector  Machine  (SVM)   •Decision  Trees   •Random  Forests Identification techniques
  • 23. cerc.iiitd.ac.in 23 •Supervised  learning   •Support  Vector  Machine  (SVM)   •Decision  Trees   •Random  Forests Identification techniques
  • 24. cerc.iiitd.ac.in 24 •Data  -­‐  7,500  wall  posts  containing  URLs   •2,500  malicious,  5,000  legitimate   •Collection  technique  -­‐  Convenient  sampling   •Methodology Support Vector Machine Efficient  and  Scalable  Socware  Detection  in  Online  Social  Networks,  Rahman  et  al.,   USENIX  Security,  2012
  • 25. cerc.iiitd.ac.in 25 •Results  -­‐  60k  posts  marked  malicious  out  of  40M  posts   •True  positive  rate:  97%  (manually  verified)   •False  positive  rate:  0.005%  (manually  verified)   •Highlights   •Work  aimed  to  detect  “socware”   •Real  world  deployment  -­‐  MyPageKeeper   •Classifier  heavily  relies  on  bag-­‐of-­‐words  and  message   similarity Support Vector Machine Efficient  and  Scalable  Socware  Detection  in  Online  Social  Networks,  Rahman  et  al.,   USENIX  Security,  2012
  • 26. cerc.iiitd.ac.in 26 •Supervised  learning   •Support  Vector  Machine  (SVM)   •Decision  Trees   •Random  Forests Identification techniques
  • 27. cerc.iiitd.ac.in 27 •Data  -­‐  187  million  wall  posts   •2.08  million  wall  posts  containing  a  URL   •Collection  technique  -­‐  Snowball  sampling   •Methodology Decision trees Towards  Online  Spam  Filtering  in  Social  Networks,  Gao  et  al.,  NDSS  2012
  • 28. cerc.iiitd.ac.in 28 •Results   •True  positive  rate:  80.9%   •False  positive  rate:  0.19%   •Throughput:  1,580  messages  /  second   •Highlights   •Model  stays  accurate  even  after  9  months  of  training   •Model  tested  on  Twitter  data  as  well   •All  “new”  instances  marked  as  legitimate;  system  cannot   detect  campaigns  initially  outside  its  learning Decision trees Towards  Online  Spam  Filtering  in  Social  Networks,  Gao  et  al.,  NDSS  2012
  • 29. cerc.iiitd.ac.in 29 •Supervised  learning   •Support  Vector  Machine  (SVM)   •Decision  Trees   •Random  Forests Identification techniques
  • 30. cerc.iiitd.ac.in 30 •Data  -­‐  3,831  user  profiles   •173  spammers,  827  legitimate  profiles  used  for  training   •Collection  technique  -­‐  Honeypots   •Methodology   •Six  features  extracted  from  profiles  in  training  set   •Applied  Random  forest  classifier  on  feature  vectors Random forests Detecting  Spammers  on  Social  Networks,  Stringhini  et  al.,  ACSAC  2010
  • 31. cerc.iiitd.ac.in 31 •Results  -­‐  10  fold  cross  validation  on  training  set   •False  positive  rate:  2%   •False  negative  rate:  1%   •130  more  profiles  marked  as  spam  from  790k  profiles  in  test   set  (7  false  positives)   •Highlights   •Only  work  to  use  honeypot  approach   •Approach  tested  on  Twitter  too   •Features  used  not  available  publicly  (FF  ratio,  friend  choice…) Random forests Detecting  Spammers  on  Social  Networks,  Stringhini  et  al.,  ACSAC  2010
  • 32. cerc.iiitd.ac.in 32 Summary Data   1.  Posts  containing  URLs   2.  User  profiles Identification  Techniques   1.  Unsupervised  learning   2.  Supervised  learning Evaluation  metrics   1.  Precision  /  Recall   2.  True  positives   3.  False  negatives   4.  Purity  /  Inverse  purity   5.  Manual  verification Collection  Techniques   1.  Snowball  sampling   2.  Convenient  sampling   3.  Honeypots I II III IV
  • 33. cerc.iiitd.ac.in 33 In a nutshell User   detection Post   detection Real   time Real  world   deployment Validation   on  other   OSNs Profile   features Content   features Network   features Gao  et  al.,   2010,  IMC ✔ ✔ Stringhini  et   al.,  ACSAC   2010 ✔ ✔ ✔ ✔ Gao  et  al.,   2012,  NDSS ✔ ✔ ✔ ✔ ✔ Rahman  et   al.,  2012   USENIX ✔ ✔ ✔ ✔ Ahmed  et   al.,  2012,   TrustCom ✔ ✔ ✔ ✔
  • 34. cerc.iiitd.ac.in 34 •Facebook   • Malicious  URL  detection  on  Facebook,  Guan  et  al.,  JWIS  2011   • FRAppE:  Detecting  malicious  applications  on  Facebook,  Rahman  et   al.,  CoNEXT  2012   •Other  OSNs   •Suspended  accounts  in  retrospect:  An  analysis  of  Twitter  spam,  Thomas   et  al.,  IMC  2011   •Detecting  spammers  and  content  promoters  in  online  video  social   networks,  Benevenuto  et  al.,  SIGIR  2009   •Uncovering  social  spammers:  social  honeypots  +  machine  learning,  Lee  et   al.,  SIGIR  2010   Other work on malicious content detection
  • 35. cerc.iiitd.ac.in 35 •Approximately  28%  users  share  their  data  with  an   audience  wider  than  their  friends.  3   •Public  data  is  not  rich   •Network  features  not  accessible  publicly   •Limited  user  profile  features  available   •No  real  time  API  endpoint  to  fetch  data  in  real  time Challenges in conducting research on Facebook 3  http://www.consumerreports.org/cro/magazine/2012/06/facebook-­‐your-­‐privacy/index.htm
  • 36. cerc.iiitd.ac.in 36 •Interaction  with  Christopher  Palow,  Engineering  Manager,   Facebook  (fb.com/palow)   •“We’re  not  really  fond  of  people  who  crawl  Facebook  for  research.”   •“Facebook’s  filters  don’t  catch  all  the  bad  content.  If  something  goes   undetected  in  real  time,  it  is  usually  detected  within  the  next  24  hours.  If   not,  it’s  gone  forever.”   •“Web  of  Trust  is  mostly  reputation  based  whereas  SURBL  is  spam  honey   pots.    We  actually  have  access  to  both  but  we  don't  keep  our  integrations   well  updated.”   •Sample  bash  script  to  look  for  spammers   grep  -­‐P  -­‐i  -­‐o  '_wau.push(["small",  "S+",  "'  /tmp/bad_site.html  |  cut  -­‐d  ,  -­‐f  2    |  cut  -­‐d  '"'  -­‐f   2  |  awk  '{print  "http://whos.amung.us/stats/history/"  $1  "/"}' Straight from Facebook
  • 37. cerc.iiitd.ac.in 37 •Large  scale  studies  for  detecting  malicious  content   •Ignore  posts  without  URLs   •Work  on  detecting  malicious  posts  which  are  part  of  a   campaign   •Rely  partially  on  passive  features  (likes,  comments  etc.)   which  take  time  to  build  up;  not  suitable  for  real  time   detection   •Don’t  rely  on  profile  features   •Crawled  Facebook  -­‐  ethical  implications   •Did  so  prior  to  2009   Research gaps
  • 38. cerc.iiitd.ac.in 38 •Existing  techniques  used  to  detect  malicious  content   on  other  social  networks  cannot  be  directly  ported  to   Facebook   •Lack  of  publicly  available  information   •Difference  in  network  dynamics  and  usage Research gaps
  • 39. cerc.iiitd.ac.in 39 •Efficient,  real  time,  self  learning  and  evolving   techniques  to  detect  malicious  content   •Study  content  without  URLs   •Light  weight  client-­‐side  solutions  for  malicious  content   identification Opportunities
  • 40. Thanks!     prateekd@iiitd.ac.in   http://precog.iiitd.edu.in/people/prateek/     cerc.iiitd.ac.in
  • 42. cerc.iiitd.ac.in 42 •Methodology   •Collected  49,110  posts  on  top  20  Facebook  pages  in  July   2010  (to  Feb.  2011).   •5,637  malicious;  43,473  benign   •Posts  made  by  users  on  page;  NOT  the  posts  made  by  page  itself.   •Filtered  posts  containing  URLs,  looked  for  URLs  blacklisted  by   PhishTank,  SURBL,  WOT,  Google  SafeBrowsing  etc.   •Extracted  7  features  (4  URL,  3  message  containing  the  URL)   •Used  Bayes  classification  model  to  classify  URLs  as   malicious  /  benign Bayes classification Malicious  URL  Detection  on  Facebook,  Guan  et  al.,  JWIS  2011
  • 43. cerc.iiitd.ac.in 43 •Results   •Authors  report  an  accuracy  of  94.9%  (TPR:  95%,  FNR:  3.56%)   •Validation   •Technique  validated  using  top  20  Facebook  pages  in  Europe   and  Asia   •Achieved  approx.  90%  accuracy  and  TPR  in  both;   performance  felt  due  to  skewed  malicious:benign  ratio  in   datasets Bayes classification Malicious  URL  Detection  on  Facebook,  Guan  et  al.,  JWIS  2011
  • 44. cerc.iiitd.ac.in 44 •FRAppE  (Facebook’s  Rigorous  Application  Evaluator)   •Identified  6,273  malicious  apps  (from  MyPageKeeper)  and   6,273  benign  apps  (from  Social  Bakers)   •Characterised  app  behaviour;  extracted  on-­‐demand  and   aggregation-­‐based  features   •Observations   •Malicious  apps  redirect  users  to  domains  with  poor   reputation   •97%  of  malicious  apps  request  for  only  “publish”  permissions   Malicious apps FRAppE:  Detecting  Malicious  Facebook  Applications,  Rahman  et  al.,  CoNEXT  2012
  • 45. cerc.iiitd.ac.in 45 •Results   •FRAppE  Lite   •Extracts  on-­‐demand  features  -­‐  Accuracy:  99%,  FN  rate:  4.4%   •FRAppE   •Extracts  on-­‐demand  and  aggregation-­‐based  features  -­‐  Accuracy:   99.5%,  FN  rate:  4.1%   •Validation   •98.5%  apps  validated  as  malicious  (apps  deleted,  app  name   similarity,  typo  squatting,  posted  link  similarity  etc.) Malicious apps FRAppE:  Detecting  Malicious  Facebook  Applications,  Rahman  et  al.,  CoNEXT  2012