Black Hat presentation on the concepts and development of µphisher, a social engineering support tool which leverages social network data in order to support the production of forged content.
µphisher can be downloaded at https://github.com/urma/microphisher/ and is released under the GPLv3 license.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Using Online Activity as Digital Fingerprints to Create a Better Spear Phisher
1. Using Online Activity as Digital
Fingerprints to Create a Better
Spear Phisher
Joaquim Espinhara and Ulisses Albuquerque
JEspinhara@trustwave.com UAlbuquerque@trustwave.com
3. • Joaquim Espinhara
– Security Consultant at Trustwave Spiderlabs
• Ulisses Albuquerque
– Coder for offense & defense… as long as it’s fun!
– Lab Manager at Trustwave Spiderlabs
About us
10. • Social Networks
– Communication channel for keeping in touch with
someone (Facebook, Twitter)
– Media sharing (Instagram)
– Specialized networks (Foursquare, GetGlue, TripIt)
Background
11. • Social Engineering
– Phishing
Background
http://www.d00med.net/uploads/0d832c77559a2070a766f899e7eg783.png
13. • Data Pre-Processing Flow
Background
Raw data set
"Had lunch with
@urma and
@jespinhara today
#tgif #lunch"
Data cleaning
"Had lunch with
@urma and
@jespinhara
today"
Data integration
"Had lunch with
@urma and
@jespinhara
today"
Data
normalization
"Had lunch with
@urma and
@jespinhara today
(2013-06-05)"
14. • Natural Language Processing – NLP
– What is it?
– How do we use it?
– Text analysis
Background
27. • Support for additional data sources
• Machine learning
• More metrics and feedback for assisted input
• Filtering presets
• Adequate handling of quoted content
Future Work
Tempo: 02:00Tempo Max: 05:00What does doWhat does not doO quepodemosconseguirO quenaoconseguimosUse this like hook to motivation
Tempo: 05:00Tempo Max: 07:00
Tempo: 20:00Tempo Max: 25:00
Tempo: 07:00Tempo Max: 17:00
Tempo: 07:00Tempo Max: 10:00
Some networks are more specialized and give the user little control over the content produced (e.g., Foursquare, Untappd, TripIt)This can be particularly hard when content is published across linked social network accounts (i.e., Foursquare publishing check-ins in Twitter)Tempo: 11:00Tempo Max: 15:00
Tempo: 15:00Tempo Max: 20:00
Tempo: 20:00Tempo Max: 25:00
Tempo: 25:00Tempo Max: 30:00Non convencional data mining concept.Algo proximo a social data mining. Uma forma de selecionar criterios relevantes para criacao do perfi.Data cleaning consists in removing records which are not relevant to a given analysis (e.g., removing all retweets, removing all Facebook status updates posted from a mobile device);Data integration consists in correlating identities between different data sources, allowing the μphiser user to perform analysis on records from multiple sources (e.g., user @jespinhara in Twitter is user 'jespinhara' in Instagram);Data normalization consists in applying transformations to data attributes in order to make records consistent regarding comparison and correlation (e.g., all date and time attributes will be represented using ISO-8601).
Association rule learning, which consists of identifying relationships between variables (e.g., @jespinhara is commonly referenced alongside @urma when "lunch" is present on the status update);Clustering, which consists of discovering groups and structures between records which do not share a known common attribute (e.g., all status updates which mention meals during the day)Anomaly detection, which consists of identifying unusual data records which require further investigation and should most likely be excluded from analysis (e.g., "Hacked by @effffn!!!!111!")Tempo Max: 50:00
Tempo: 50:00Tempo Max: 55:00
Tempo: 55:00Tempo Max: 60:00
Tempo: 70:00Tempo Max: 75:00
Tempo: 70:00Tempo Max: 75:00
Tempo: 70:00Tempo Max: 75:00
Validating textual input against the profile in order to verify the probability of some text being written by an unsub (e.g., checking whether a given tweet was indeed authored by a profiled individual);Assisting the user in writing content which is similar to what the unsub would write regarding word frequency, sentence length, amount of sentences and phrases, amongst other criteriaTempo: 90:00Tempo Max: 95:00
Validating textual input against the profile in order to verify the probability of some text being written by an unsub (e.g., checking whether a given tweet was indeed authored by a profiled individual);Assisting the user in writing content which is similar to what the unsub would write regarding word frequency, sentence length, amount of sentences and phrases, amongst other criteriaTempo: 90:00Tempo Max: 95:00
Tempo: 105:00Tempo Max: 110:00
Tempo: 110:00Tempo Max: 115:00Practicallimitations – Twitterlimitstweetfetchtoaround 3200 tweetsAllofficialAPIs are rate-limited, handlingthosethroughasynchronous threads whileprovidingmeaninful feedback to µphisherusercanbe hardSome metricsrequire NLP processing, whichfails HARD onpartialsentences/abbreviations