Using Online Activity as Digital Fingerprints to Create a Better Spear Phisher

400 views

Published on

Black Hat presentation on the concepts and development of µphisher, a social engineering support tool which leverages social network data in order to support the production of forged content.

µphisher can be downloaded at https://github.com/urma/microphisher/ and is released under the GPLv3 license.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
400
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Inicio – 00:00
  • Tempo: 02:00Tempo Max: 05:00What does doWhat does not doO quepodemosconseguirO quenaoconseguimosUse this like hook to motivation
  • Tempo: 05:00Tempo Max: 07:00
  • Tempo: 20:00Tempo Max: 25:00
  • Tempo: 07:00Tempo Max: 17:00
  • Tempo: 07:00Tempo Max: 10:00
  • Some networks are more specialized and give the user little control over the content produced (e.g., Foursquare, Untappd, TripIt)This can be particularly hard when content is published across linked social network accounts (i.e., Foursquare publishing check-ins in Twitter)Tempo: 11:00Tempo Max: 15:00
  • Tempo: 15:00Tempo Max: 20:00
  • Tempo: 20:00Tempo Max: 25:00
  • Tempo: 25:00Tempo Max: 30:00Non convencional data mining concept.Algo proximo a social data mining. Uma forma de selecionar criterios relevantes para criacao do perfi.Data cleaning consists in removing records which are not relevant to a given analysis (e.g., removing all retweets, removing all Facebook status updates posted from a mobile device);Data integration consists in correlating identities between different data sources, allowing the μphiser user to perform analysis on records from multiple sources (e.g., user @jespinhara in Twitter is user 'jespinhara' in Instagram);Data normalization consists in applying transformations to data attributes in order to make records consistent regarding comparison and correlation (e.g., all date and time attributes will be represented using ISO-8601).
  • Association rule learning, which consists of identifying relationships between variables (e.g., @jespinhara is commonly referenced alongside @urma when "lunch" is present on the status update);Clustering, which consists of discovering groups and structures between records which do not share a known common attribute (e.g., all status updates which mention meals during the day)Anomaly detection, which consists of identifying unusual data records which require further investigation and should most likely be excluded from analysis (e.g., "Hacked by @effffn!!!!111!")Tempo Max: 50:00
  • Tempo: 50:00Tempo Max: 55:00
  • Tempo: 55:00Tempo Max: 60:00
  • Tempo: 70:00Tempo Max: 75:00
  • Tempo: 70:00Tempo Max: 75:00
  • Tempo: 70:00Tempo Max: 75:00
  • Validating textual input against the profile in order to verify the probability of some text being written by an unsub (e.g., checking whether a given tweet was indeed authored by a profiled individual);Assisting the user in writing content which is similar to what the unsub would write regarding word frequency, sentence length, amount of sentences and phrases, amongst other criteriaTempo: 90:00Tempo Max: 95:00
  • Validating textual input against the profile in order to verify the probability of some text being written by an unsub (e.g., checking whether a given tweet was indeed authored by a profiled individual);Assisting the user in writing content which is similar to what the unsub would write regarding word frequency, sentence length, amount of sentences and phrases, amongst other criteriaTempo: 90:00Tempo Max: 95:00
  • Tempo: 105:00Tempo Max: 110:00
  • Tempo: 110:00Tempo Max: 115:00Practicallimitations – Twitterlimitstweetfetchtoaround 3200 tweetsAllofficialAPIs are rate-limited, handlingthosethroughasynchronous threads whileprovidingmeaninful feedback to µphisherusercanbe hardSome metricsrequire NLP processing, whichfails HARD onpartialsentences/abbreviations
  • Tempo: 120:00Tempo Max: 140:00
  • Tempo: 140:00Tempo Max: 145:00Machinelearning => partialcontentprocessing => context-awareautocompletesuggestions
  • Tempo: 145:00Tempo Max: 150:00Noise is badLinked social networks are badHandling quoted content is hard
  • Using Online Activity as Digital Fingerprints to Create a Better Spear Phisher

    1. 1. Using Online Activity as Digital Fingerprints to Create a Better Spear Phisher Joaquim Espinhara and Ulisses Albuquerque JEspinhara@trustwave.com UAlbuquerque@trustwave.com
    2. 2. • Introduction • Motivation • Background • HowStuffWorks • µphisher • Demo • Future Work • Conclusion Agenda
    3. 3. • Joaquim Espinhara – Security Consultant at Trustwave Spiderlabs • Ulisses Albuquerque – Coder for offense & defense… as long as it’s fun! – Lab Manager at Trustwave Spiderlabs About us
    4. 4. INTRODUCTION
    5. 5. OUR MOTIVATION
    6. 6. • Why? • Tools available Our Motivation
    7. 7. BACKGROUND
    8. 8. • Social Networks • Social Engineering • Data Pre-Processing • Natural Language Processing - NLP Background
    9. 9. • Social Networks Background Facebook Twitter LinkedIn Others
    10. 10. • Social Networks – Communication channel for keeping in touch with someone (Facebook, Twitter) – Media sharing (Instagram) – Specialized networks (Foursquare, GetGlue, TripIt) Background
    11. 11. • Social Engineering – Phishing Background http://www.d00med.net/uploads/0d832c77559a2070a766f899e7eg783.png
    12. 12. • Data Pre-Processing – What is it? – How do we use it? Background
    13. 13. • Data Pre-Processing Flow Background Raw data set "Had lunch with @urma and @jespinhara today #tgif #lunch" Data cleaning "Had lunch with @urma and @jespinhara today" Data integration "Had lunch with @urma and @jespinhara today" Data normalization "Had lunch with @urma and @jespinhara today (2013-06-05)"
    14. 14. • Natural Language Processing – NLP – What is it? – How do we use it? – Text analysis Background
    15. 15. HOWSTUFFWORKS
    16. 16. Identifying the subject to profile Collecting social network data Analyzing and building the profile Our Approach
    17. 17. • The Unknown Subject (Unsub) Our Approach Joaquim Espinhara @jespinhara (Twitter) joaquim.espinhara (Facebook) uid=12345 (LinkedIn)
    18. 18. • Data Collection – Social Network IDs – Official APIs – Web Scraping – OAuth Our Approach
    19. 19. • Data Collection - Twitter Our Approach Application ID (µphisher) User ID (@jespinhara) Twitter @urma @effffn @SpiderLabs
    20. 20. µPHISHER
    21. 21. • Reference implementation • Goals – Validate potential unsub content – Assisted textual content input µphisher
    22. 22. • Web Application • Twitter only (for now) • Open Source (GPLv3) µphisher
    23. 23. µphisher Ruby on Rails MongoDB + Mongoid DelayedJob OAuth treat (Stanford NLP Core) µphisher
    24. 24. µphisher Authentication Unsub Registration Data Source Registration Data Collection Work Set Definition Work Set Analysis Unsub Profile
    25. 25. DEMO (FINGERS CROSSED)
    26. 26. LIMITATIONS
    27. 27. • Support for additional data sources • Machine learning • More metrics and feedback for assisted input • Filtering presets • Adequate handling of quoted content Future Work
    28. 28. CONCLUSION
    29. 29. THANK YOU! @urma@jespinhara

    ×