SlideShare a Scribd company logo
1 of 47
Download to read offline
Tailored,
Machine Learning-driven
Password Guessing Attacks
and Mitigation
Georg Knabl
Georg Knabl
• self-employed IT-Consultant &
Software Engineer at
• based in Graz, Austria
• areas of expertise
• machine learning implementations
• web development
• information security
2
3
The Problem with
Human Passwords
4
A Human Attack Vector
• people use password creation schemes
• types
• machine-random (&CtAEaCp?b&v"s%)
• human-general (123456)
• human-individual (John1970!)
• human-random (randomly typed, 34ghjk34f3hjkHGFC)
• What about correct horse battery staple?
• issues
• reduced entropy
• attacker: knowing scheme (+ personal data) => password
• humans limited in creativity
 somebody else might have come up with same scheme
 schemes publicly available in password leaks
5
Attacking Passwords
6
Traditional Approaches
Hybrid or rule-
based
•dictionaries
•word-
mangling
rules
Markov Models
•high-
probability
character
sequences
Masks
•reduce set to
typical
structures
Brute-force
•try every
possible
combination
7
key space (Dunning, 2016)
• tool support:
hashcat, John-the-Ripper, PACK, CeWL, CUPP, …
Dictionary Sources
• password leaks: rockyou.txt, exploit.in, …
• tailored lists
• CeWL: web scraping
• CUPP: pre-defined questions
8
Analytics
Website
Designs
Webdesign
Rebranding
passionately
simply
Factory
…
smithJohn@*
smithJohn@@
smithJohn_1
smithSmithy
smith_
smith_01
smith_01050
…
123456
12345
123456789
password
iloveyou
princess
1234567
12345678
abc123
…
Machine-generated Text
9
Neural Networks
10
• analyze huge datasets
• learn hidden structures
• reproduce structures
on new data
• supervised learning process:
train on data generate model
use model to
analyze/generate
Recurrent Neural Networks (RNN)
• learn, analyze, reproduce sequences
• password = sequence of characters
• password list: next password
 n: just another character
11
(Olah, 2015)
RNN Tokenization
12
0 a
1 b
2 c
3 d
4 e
… …
92 n
„abc“
source
data
training generation
target
data
„cde“0, 1, 2 2, 3, 4
char-rnn
• RNN predicts character sequences based on
training text
• by Andrej Karpathy
• https://github.com/karpathy/char-rnn
13
(Karpathy, 2015)
Works of Shakespeare
14
training
output
(Karpathy, 2015)
Linux Source Code
15
training output
(Karpathy, 2015)
rockyou.txt
16
training output
General Human Passwords Guessing
• Neural
Networks
outperform
other methods
at above 10^10
guesses
• (almost) infinite
number of
passwords
17
(Melicher et. al., 2016)
Exploiting Individual Human
Password Schemes
A Machine Learning Approach
18
Relevance
• most passwords have
individual context
• individual details publicly
available (OSINT)
• social media
 harvester scripts
• website user tables
 leaked database dumps
• …
19
exploit.in
Tailored Password Lists
20
training output
John2050
180374
09091958
06031982
160883
soni
John!
john!
j0hn.5m17h
john.smith
Smith866
asdfghj
John50
Data Protection Compliance
• EU-GDPR (General Data Protection Regulation)
• significant fines
• up to 20 mio. € or 4% of worldwide annual revenue
• processing personal data requires consent
• password lists contain personal information
•  publicly available leaked data illegal
• imbalance
• info-sec researcher:
has to comply & find (less ideal) alternatives
• attacker:
ignores regulations & trains on best available data
21
Data Protection Compliance
• compliant solutions to collect data
• general passwords:
• use e.g. top-100,000 passwords list
 no personal details contained
• individual details + passwords:
• compliance based on "public interest"? (GDPR Art. 6 (1) (e))
• collect consent from users
 requires broad access to user data
a) directly store & relate data until training is finished
 requires password storage in plaintext (!!!)
b) only store tokenized password schemes without user relation
 requires all relatable personal data to be known at password
hashing time
22
Challenges
• generate password sequences ✓
• GDPR compliance ?
• recognize & relate individual structures ?
• How to relate personal data?
• same scheme, different character sequences
<first name><year of birth>!
John1985!, Jane1992!
• dealing with obfuscations ?
• e.g. Leetspeak, all upper/lower case
j0hn1985!, JOHN1985!, john1985!
23
Generating a Dataset Containing
Individual Details
• starting point: any password leak that contains
a personal identifier
• char-rnn requires > 50,000 entries for proper
results
• e.g. exploit.in (797 mio. credentials):
<email address>:<password>
• collect, match and attach personal details to
entries
• e.g. using social media harvester
24
Generating a Dataset Containing
Individual Details
25
Gender Username First Name Last Name Year of Birth Password
f margarete Judy Wells 1972 Wells106
f sondra Lucia Morrow 1950 cvbnm
f zakia Gale Weiss 1999 syndikat
f eada Ana Elliott 1994 Ana94
f karalee Denise Hanson 1965 OLIVER
m agatha Edmond Daniels 1956 Agatha
…
• example result:
Password Schemes Used
• Random: random choice of top-X password list (e.g. 123456)
• Easy to Type: nearby characters on keyboard (e.g. qwerty)
• Username: use person‘s username (e.g. smithy)
• First Name + „!“: use person‘s first name plus exclamation mark (e.g.
John!)
• Lowercased First Name + „!“: use person‘s lowercased first name plus
exclamation mark (e.g. john!)
• Last Name + Random Int: use person‘s last name plus a three digit integer
at the end (e.g. Smith758)
• Username Leetspeak: use person‘s username in Leetspeak (e.g. 5m17hy)
• First Name + Year of Birth (4 digits): use person‘s first name plus their year
of birth (e.g. John1985)
• First Name + Year of Birth (2 digits): use person‘s first name plus their year
of birth in two digits (e.g. John85)
26
Tokenization
• replace personal details with column id
• column id is just another character
• problem: exact matching fails to match
obfuscations or abbreviations
• John != j0hn
• 1986 != 86
27
# First Name Year of Birth Password Resulting Password Tokens
1 Max 1983 Max1983! column: First Name, column: Year of Birth, !
2 John 1986 John86! column: First Name, 8, 6, !
3 Max 1987 123456 1, 2, 3, 4, 5, 6
Support Matching Using Data
Variations
• add on-the-fly word mangling rules to columns
• Leetspeak
• lowercase
• uppercase
• …
28
f f f F tania 74n14 tania TANIA Kara k4r4 kara KARA Rosales r054135 rosales ROSALES
…
f tania Kara Rosales
…
Challenges
• generate password sequences ✓
• GDPR compliance ✓
•  use top-X password lists + fake rules
• recognize & relate individual structures ✓
•  column ids instead individual details
• dealing with obfuscations ✓
•  on-the-fly word mangling rules to extend
columns
29
Implementation
• Python application based on Sean Robertson's
pytorch-char-rnn
• https://github.com/spro/char-rnn.pytorch
• adaptions (excerpt)
• matrix-based individual detail matching
• on-the-fly word-mangling rules
30
Training
31
Whn
carickte
aanhls
cshscarn
suasso
ail
zpkoty
beigedl
11883469
aw
aeeenl
aiseie
enal
faedni
bnoxtln
Wh
ronis25
44353133
maty
0598971
treames
bicken
ratont
tulie
stocker
shathos
netrer
derfa
tolei
dorled
Wh
ge
butter
jackout
05081984
lllllll
sian
harder
chedle
raven
11021985
supers
17031988
spike
duddick
epoch 10 epoch 40 epoch 280
Attacking the Target
• collect data about victim & generate dataset
• use trained model to generate a tailored
password list
• quality of list depends heavily on
• selected training data
• hyperparameter configuration
32
Gender Username First Name Last Name Year of Birth
m john.smith John Smith 2050
Results & Qualitative Analysis
33
Scheme Adoption
34
John2050
180374
09091958
06031982
160883
soni
John!
John!
[skipped until line 14]
john!
[skipped until line 23]
j0hn.5m17h
[skipped until line 30]
john.smith
[skipped until line 80]
Smith866
[skipped until line 85]
asdfghj
[skipped until line 514]
John50
[...]
Random:
stochastic character generation
(mostly human dates)
First Name + Year of Birth (4 digits):
learned
Username Leetspeak:
learned using word mangling
Last Name + Random Int:
partially learned + stochastic generation
Lowercased First Name + „!“:
learned using word mangling
First Name + „!“:
learned
Easy to Type:
learned
Username:
learned
First Name + Year of Birth (2 digits):
partially learned + stochastic generation
Duplicate because of
few available rules
Gender Username First Name Last Name Year of Birth
m john.smith John Smith 2050
Proving Password Scheme Adoption
1. use new fake dataset with same schemes
2. loop through each entry and generate a
individual password list (1000 entries)
3. check if password is on that list
35
Gender Username First Name Last Name Year of Birth Password
f margarete Judy Wells 1972 Wells106
?
Results
• 6 models with different
configurations
• all models match about
70% in password lists of
only ~100 lines
• optimized configurations
increase matching
efficiency
• recreated distributions
of schemes
36
Mitigation
37
Mitigation Strategies
• generating own model and check user‘s password
against generated lists
• attacker‘s model and dataset not available
 password lists will differ
• long or complex passwords
• passwords might still be guessed if they contain
personal information
• e.g. JohnSmith1985 is actually
<column: firstname><column: lastname><column: year of birth>
• treating all human-like passwords as insecure
• requires classification of human likeliness
38
Human Password Classification
• using machine learning to classify human likeliness
• dataset (80k human + 80k machine labeled passwords)
• classifiers
• Logistic Regression
• Multinomial Naïve Bayes
• Linear Support Vector Machine
• Random Forest
• vectorizers
• TFIDF
• Count
39
&CtAEaCp?b&v"s% m
-SUuf4TLtF m
mallrats h
bP0.}BO/L&{: m
^=c.rgH$z m
boxers h
j&uzHCutff_A{ m
656565 h
6>IB|~@4^n}K m
forever1 h
…
Results
accuracy human vs. machine-random:
99% correct
40
14061966 0.9961306540
y-JQ6{v;_yb|q 0.0000000000
ZBT4n#z-x 0.0000121259
longball 0.9920406811
vikings 0.9723564484
gunit 0.9683620674
.XP?]b36nP]l| 0.0000000000
8J9{Bd^ 0.0000107884
123india 0.9986476258
*[qg;t 0.0000058089
…
What about randomly-typed
passwords?
• human-random passwords
• almost impossible for humans to distinguish
• previously trained model:
83% correct
• specifically trained model (human-random vs. machine-random):
94% correct
41
,asgl213
HGHfwjiofjiw!?
FEA452
dciuowed7983zy_
jksdgf644kjbndf
Xkkeelt7tad5z
sabjas012
123jfmvfkfn49fvk.
…
Demo
42
Conclusion
• machine learning can be used to efficiently
attack passwords created by humans
• mitigation
• treat human passwords as insecure
• warn users or provide password policy
 use machine learning model to identify human
passwords
 integrate on web servers & password storage
services
43
Resources
• Thesis Machine Learning-driven Password
Lists:
• https://www.researchgate.net/publication/328719
001_Machine_Learning-driven_Password_Lists
• Human Password Classifier:
• https://github.com/georgknabl/human-password-
classifier
• ready-to-use trained models available via e-mail
44
45
"The only secure password is the one you can't remember."
Troy Hunt (haveibeenpwned.com)
Contact
46
DI (FH) Georg Knabl, MSc
IT-Consultant & Software Engineer
georg.knabl@pageonstage.at
Sources
• Dunning, Julian (2016). Statistics Will Crack Your Password. Available
from: https://p16.praetorian.com/blog/statistics-will-crack-
yourpassword-mask-structure [Mar. 3, 2018]
• Karpathy, Andrej (2015). The Unreasonable Effectiveness of
Recurrent Neural Networks. Available from:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ [Nov. 10,
2017]
• Melicher, William, Blase Ur, Sean M Segreti, Saranga Komanduri,
Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor (2016). „Fast,
Lean, and Accurate: Modeling Password Guessability Using Neural
Networks“. In: 25th {USENIX} Security Symposium ({USENIX} Security
16). Vancouver: {USENIX} Association, pp. 175–191.
• Olah, Christopher (2015). Understanding LSTM Networks. Available
from: http://colah.github.io/posts/2015- 08-Understanding- LSTMs/
[Nov. 10, 2017]
47

More Related Content

Similar to Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation

Hackers are innocent
Hackers are innocentHackers are innocent
Hackers are innocentdanish3
 
Password Cracking
Password CrackingPassword Cracking
Password CrackingSagar Verma
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!nerdybeardo
 
Nicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsNicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsCSNP
 
2018 FRSecure CISSP Mentor Program- Session 5
2018 FRSecure CISSP Mentor Program-  Session 52018 FRSecure CISSP Mentor Program-  Session 5
2018 FRSecure CISSP Mentor Program- Session 5FRSecure
 
CNIT 123 12: Cryptography
CNIT 123 12: CryptographyCNIT 123 12: Cryptography
CNIT 123 12: CryptographySam Bowne
 
What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)Xavier Mertens
 
Techniques for password hashing and cracking
Techniques for password hashing and crackingTechniques for password hashing and cracking
Techniques for password hashing and crackingNipun Joshi
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Chapter# 3 modified.pptx
Chapter# 3 modified.pptxChapter# 3 modified.pptx
Chapter# 3 modified.pptxMaryam522887
 
Cryptography
CryptographyCryptography
CryptographyPPT4U
 
Dark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyDark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyMarcus Leaning
 

Similar to Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation (20)

Hackers are innocent
Hackers are innocentHackers are innocent
Hackers are innocent
 
Ppsp icassp17v10
Ppsp icassp17v10Ppsp icassp17v10
Ppsp icassp17v10
 
Password Cracking
Password CrackingPassword Cracking
Password Cracking
 
L27
L27L27
L27
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!
 
Nicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of PasswordsNicholas Dorans - The Evolution of Passwords
Nicholas Dorans - The Evolution of Passwords
 
2018 FRSecure CISSP Mentor Program- Session 5
2018 FRSecure CISSP Mentor Program-  Session 52018 FRSecure CISSP Mentor Program-  Session 5
2018 FRSecure CISSP Mentor Program- Session 5
 
CNIT 123 12: Cryptography
CNIT 123 12: CryptographyCNIT 123 12: Cryptography
CNIT 123 12: Cryptography
 
What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)What are-you-investigate-today? (version 2.0)
What are-you-investigate-today? (version 2.0)
 
Passwords
PasswordsPasswords
Passwords
 
Techniques for password hashing and cracking
Techniques for password hashing and crackingTechniques for password hashing and cracking
Techniques for password hashing and cracking
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Security.ppt
Security.pptSecurity.ppt
Security.ppt
 
Data platform ID generation
Data platform ID generationData platform ID generation
Data platform ID generation
 
Chapter# 3 modified.pptx
Chapter# 3 modified.pptxChapter# 3 modified.pptx
Chapter# 3 modified.pptx
 
Symmetric encryption
Symmetric encryptionSymmetric encryption
Symmetric encryption
 
Computer Security
Computer SecurityComputer Security
Computer Security
 
From OSINT to Phishing presentation
From OSINT to Phishing presentationFrom OSINT to Phishing presentation
From OSINT to Phishing presentation
 
Cryptography
CryptographyCryptography
Cryptography
 
Dark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 CryptographyDark Side of the Net Lecture 2 Cryptography
Dark Side of the Net Lecture 2 Cryptography
 

More from DefCamp

Remote Yacht Hacking
Remote Yacht HackingRemote Yacht Hacking
Remote Yacht HackingDefCamp
 
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!DefCamp
 
The Charter of Trust
The Charter of TrustThe Charter of Trust
The Charter of TrustDefCamp
 
Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?DefCamp
 
Bridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXBridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXDefCamp
 
Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...DefCamp
 
Drupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDrupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDefCamp
 
Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)DefCamp
 
Trust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFATrust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFADefCamp
 
Threat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationThreat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationDefCamp
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money downDefCamp
 
Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...DefCamp
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochDefCamp
 
The challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareThe challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareDefCamp
 
Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?DefCamp
 
Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured DefCamp
 
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...DefCamp
 
We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.DefCamp
 
Connect & Inspire Cyber Security
Connect & Inspire Cyber SecurityConnect & Inspire Cyber Security
Connect & Inspire Cyber SecurityDefCamp
 
The lions and the watering hole
The lions and the watering holeThe lions and the watering hole
The lions and the watering holeDefCamp
 

More from DefCamp (20)

Remote Yacht Hacking
Remote Yacht HackingRemote Yacht Hacking
Remote Yacht Hacking
 
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!Mobile, IoT, Clouds… It’s time to hire your own risk manager!
Mobile, IoT, Clouds… It’s time to hire your own risk manager!
 
The Charter of Trust
The Charter of TrustThe Charter of Trust
The Charter of Trust
 
Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?Internet Balkanization: Why Are We Raising Borders Online?
Internet Balkanization: Why Are We Raising Borders Online?
 
Bridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UXBridging the gap between CyberSecurity R&D and UX
Bridging the gap between CyberSecurity R&D and UX
 
Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...Secure and privacy-preserving data transmission and processing using homomorp...
Secure and privacy-preserving data transmission and processing using homomorp...
 
Drupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the AttackerDrupalgeddon 2 – Yet Another Weapon for the Attacker
Drupalgeddon 2 – Yet Another Weapon for the Attacker
 
Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)Economical Denial of Sustainability in the Cloud (EDOS)
Economical Denial of Sustainability in the Cloud (EDOS)
 
Trust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFATrust, but verify – Bypassing MFA
Trust, but verify – Bypassing MFA
 
Threat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical ApplicationThreat Hunting: From Platitudes to Practical Application
Threat Hunting: From Platitudes to Practical Application
 
Building application security with 0 money down
Building application security with 0 money downBuilding application security with 0 money down
Building application security with 0 money down
 
Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...Implementation of information security techniques on modern android based Kio...
Implementation of information security techniques on modern android based Kio...
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epoch
 
The challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcareThe challenge of building a secure and safe digital environment in healthcare
The challenge of building a secure and safe digital environment in healthcare
 
Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?Timing attacks against web applications: Are they still practical?
Timing attacks against web applications: Are they still practical?
 
Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured Tor .onions: The Good, The Rotten and The Misconfigured
Tor .onions: The Good, The Rotten and The Misconfigured
 
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
 
We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.We will charge you. How to [b]reach vendor’s network using EV charging station.
We will charge you. How to [b]reach vendor’s network using EV charging station.
 
Connect & Inspire Cyber Security
Connect & Inspire Cyber SecurityConnect & Inspire Cyber Security
Connect & Inspire Cyber Security
 
The lions and the watering hole
The lions and the watering holeThe lions and the watering hole
The lions and the watering hole
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation

  • 1. Tailored, Machine Learning-driven Password Guessing Attacks and Mitigation Georg Knabl
  • 2. Georg Knabl • self-employed IT-Consultant & Software Engineer at • based in Graz, Austria • areas of expertise • machine learning implementations • web development • information security 2
  • 3. 3
  • 5. A Human Attack Vector • people use password creation schemes • types • machine-random (&CtAEaCp?b&v"s%) • human-general (123456) • human-individual (John1970!) • human-random (randomly typed, 34ghjk34f3hjkHGFC) • What about correct horse battery staple? • issues • reduced entropy • attacker: knowing scheme (+ personal data) => password • humans limited in creativity  somebody else might have come up with same scheme  schemes publicly available in password leaks 5
  • 7. Traditional Approaches Hybrid or rule- based •dictionaries •word- mangling rules Markov Models •high- probability character sequences Masks •reduce set to typical structures Brute-force •try every possible combination 7 key space (Dunning, 2016) • tool support: hashcat, John-the-Ripper, PACK, CeWL, CUPP, …
  • 8. Dictionary Sources • password leaks: rockyou.txt, exploit.in, … • tailored lists • CeWL: web scraping • CUPP: pre-defined questions 8 Analytics Website Designs Webdesign Rebranding passionately simply Factory … smithJohn@* smithJohn@@ smithJohn_1 smithSmithy smith_ smith_01 smith_01050 … 123456 12345 123456789 password iloveyou princess 1234567 12345678 abc123 …
  • 10. Neural Networks 10 • analyze huge datasets • learn hidden structures • reproduce structures on new data • supervised learning process: train on data generate model use model to analyze/generate
  • 11. Recurrent Neural Networks (RNN) • learn, analyze, reproduce sequences • password = sequence of characters • password list: next password  n: just another character 11 (Olah, 2015)
  • 12. RNN Tokenization 12 0 a 1 b 2 c 3 d 4 e … … 92 n „abc“ source data training generation target data „cde“0, 1, 2 2, 3, 4
  • 13. char-rnn • RNN predicts character sequences based on training text • by Andrej Karpathy • https://github.com/karpathy/char-rnn 13 (Karpathy, 2015)
  • 15. Linux Source Code 15 training output (Karpathy, 2015)
  • 17. General Human Passwords Guessing • Neural Networks outperform other methods at above 10^10 guesses • (almost) infinite number of passwords 17 (Melicher et. al., 2016)
  • 18. Exploiting Individual Human Password Schemes A Machine Learning Approach 18
  • 19. Relevance • most passwords have individual context • individual details publicly available (OSINT) • social media  harvester scripts • website user tables  leaked database dumps • … 19 exploit.in
  • 20. Tailored Password Lists 20 training output John2050 180374 09091958 06031982 160883 soni John! john! j0hn.5m17h john.smith Smith866 asdfghj John50
  • 21. Data Protection Compliance • EU-GDPR (General Data Protection Regulation) • significant fines • up to 20 mio. € or 4% of worldwide annual revenue • processing personal data requires consent • password lists contain personal information •  publicly available leaked data illegal • imbalance • info-sec researcher: has to comply & find (less ideal) alternatives • attacker: ignores regulations & trains on best available data 21
  • 22. Data Protection Compliance • compliant solutions to collect data • general passwords: • use e.g. top-100,000 passwords list  no personal details contained • individual details + passwords: • compliance based on "public interest"? (GDPR Art. 6 (1) (e)) • collect consent from users  requires broad access to user data a) directly store & relate data until training is finished  requires password storage in plaintext (!!!) b) only store tokenized password schemes without user relation  requires all relatable personal data to be known at password hashing time 22
  • 23. Challenges • generate password sequences ✓ • GDPR compliance ? • recognize & relate individual structures ? • How to relate personal data? • same scheme, different character sequences <first name><year of birth>! John1985!, Jane1992! • dealing with obfuscations ? • e.g. Leetspeak, all upper/lower case j0hn1985!, JOHN1985!, john1985! 23
  • 24. Generating a Dataset Containing Individual Details • starting point: any password leak that contains a personal identifier • char-rnn requires > 50,000 entries for proper results • e.g. exploit.in (797 mio. credentials): <email address>:<password> • collect, match and attach personal details to entries • e.g. using social media harvester 24
  • 25. Generating a Dataset Containing Individual Details 25 Gender Username First Name Last Name Year of Birth Password f margarete Judy Wells 1972 Wells106 f sondra Lucia Morrow 1950 cvbnm f zakia Gale Weiss 1999 syndikat f eada Ana Elliott 1994 Ana94 f karalee Denise Hanson 1965 OLIVER m agatha Edmond Daniels 1956 Agatha … • example result:
  • 26. Password Schemes Used • Random: random choice of top-X password list (e.g. 123456) • Easy to Type: nearby characters on keyboard (e.g. qwerty) • Username: use person‘s username (e.g. smithy) • First Name + „!“: use person‘s first name plus exclamation mark (e.g. John!) • Lowercased First Name + „!“: use person‘s lowercased first name plus exclamation mark (e.g. john!) • Last Name + Random Int: use person‘s last name plus a three digit integer at the end (e.g. Smith758) • Username Leetspeak: use person‘s username in Leetspeak (e.g. 5m17hy) • First Name + Year of Birth (4 digits): use person‘s first name plus their year of birth (e.g. John1985) • First Name + Year of Birth (2 digits): use person‘s first name plus their year of birth in two digits (e.g. John85) 26
  • 27. Tokenization • replace personal details with column id • column id is just another character • problem: exact matching fails to match obfuscations or abbreviations • John != j0hn • 1986 != 86 27 # First Name Year of Birth Password Resulting Password Tokens 1 Max 1983 Max1983! column: First Name, column: Year of Birth, ! 2 John 1986 John86! column: First Name, 8, 6, ! 3 Max 1987 123456 1, 2, 3, 4, 5, 6
  • 28. Support Matching Using Data Variations • add on-the-fly word mangling rules to columns • Leetspeak • lowercase • uppercase • … 28 f f f F tania 74n14 tania TANIA Kara k4r4 kara KARA Rosales r054135 rosales ROSALES … f tania Kara Rosales …
  • 29. Challenges • generate password sequences ✓ • GDPR compliance ✓ •  use top-X password lists + fake rules • recognize & relate individual structures ✓ •  column ids instead individual details • dealing with obfuscations ✓ •  on-the-fly word mangling rules to extend columns 29
  • 30. Implementation • Python application based on Sean Robertson's pytorch-char-rnn • https://github.com/spro/char-rnn.pytorch • adaptions (excerpt) • matrix-based individual detail matching • on-the-fly word-mangling rules 30
  • 32. Attacking the Target • collect data about victim & generate dataset • use trained model to generate a tailored password list • quality of list depends heavily on • selected training data • hyperparameter configuration 32 Gender Username First Name Last Name Year of Birth m john.smith John Smith 2050
  • 33. Results & Qualitative Analysis 33
  • 34. Scheme Adoption 34 John2050 180374 09091958 06031982 160883 soni John! John! [skipped until line 14] john! [skipped until line 23] j0hn.5m17h [skipped until line 30] john.smith [skipped until line 80] Smith866 [skipped until line 85] asdfghj [skipped until line 514] John50 [...] Random: stochastic character generation (mostly human dates) First Name + Year of Birth (4 digits): learned Username Leetspeak: learned using word mangling Last Name + Random Int: partially learned + stochastic generation Lowercased First Name + „!“: learned using word mangling First Name + „!“: learned Easy to Type: learned Username: learned First Name + Year of Birth (2 digits): partially learned + stochastic generation Duplicate because of few available rules Gender Username First Name Last Name Year of Birth m john.smith John Smith 2050
  • 35. Proving Password Scheme Adoption 1. use new fake dataset with same schemes 2. loop through each entry and generate a individual password list (1000 entries) 3. check if password is on that list 35 Gender Username First Name Last Name Year of Birth Password f margarete Judy Wells 1972 Wells106 ?
  • 36. Results • 6 models with different configurations • all models match about 70% in password lists of only ~100 lines • optimized configurations increase matching efficiency • recreated distributions of schemes 36
  • 38. Mitigation Strategies • generating own model and check user‘s password against generated lists • attacker‘s model and dataset not available  password lists will differ • long or complex passwords • passwords might still be guessed if they contain personal information • e.g. JohnSmith1985 is actually <column: firstname><column: lastname><column: year of birth> • treating all human-like passwords as insecure • requires classification of human likeliness 38
  • 39. Human Password Classification • using machine learning to classify human likeliness • dataset (80k human + 80k machine labeled passwords) • classifiers • Logistic Regression • Multinomial Naïve Bayes • Linear Support Vector Machine • Random Forest • vectorizers • TFIDF • Count 39 &CtAEaCp?b&v"s% m -SUuf4TLtF m mallrats h bP0.}BO/L&{: m ^=c.rgH$z m boxers h j&uzHCutff_A{ m 656565 h 6>IB|~@4^n}K m forever1 h …
  • 40. Results accuracy human vs. machine-random: 99% correct 40 14061966 0.9961306540 y-JQ6{v;_yb|q 0.0000000000 ZBT4n#z-x 0.0000121259 longball 0.9920406811 vikings 0.9723564484 gunit 0.9683620674 .XP?]b36nP]l| 0.0000000000 8J9{Bd^ 0.0000107884 123india 0.9986476258 *[qg;t 0.0000058089 …
  • 41. What about randomly-typed passwords? • human-random passwords • almost impossible for humans to distinguish • previously trained model: 83% correct • specifically trained model (human-random vs. machine-random): 94% correct 41 ,asgl213 HGHfwjiofjiw!? FEA452 dciuowed7983zy_ jksdgf644kjbndf Xkkeelt7tad5z sabjas012 123jfmvfkfn49fvk. …
  • 43. Conclusion • machine learning can be used to efficiently attack passwords created by humans • mitigation • treat human passwords as insecure • warn users or provide password policy  use machine learning model to identify human passwords  integrate on web servers & password storage services 43
  • 44. Resources • Thesis Machine Learning-driven Password Lists: • https://www.researchgate.net/publication/328719 001_Machine_Learning-driven_Password_Lists • Human Password Classifier: • https://github.com/georgknabl/human-password- classifier • ready-to-use trained models available via e-mail 44
  • 45. 45 "The only secure password is the one you can't remember." Troy Hunt (haveibeenpwned.com)
  • 46. Contact 46 DI (FH) Georg Knabl, MSc IT-Consultant & Software Engineer georg.knabl@pageonstage.at
  • 47. Sources • Dunning, Julian (2016). Statistics Will Crack Your Password. Available from: https://p16.praetorian.com/blog/statistics-will-crack- yourpassword-mask-structure [Mar. 3, 2018] • Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. Available from: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ [Nov. 10, 2017] • Melicher, William, Blase Ur, Sean M Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor (2016). „Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks“. In: 25th {USENIX} Security Symposium ({USENIX} Security 16). Vancouver: {USENIX} Association, pp. 175–191. • Olah, Christopher (2015). Understanding LSTM Networks. Available from: http://colah.github.io/posts/2015- 08-Understanding- LSTMs/ [Nov. 10, 2017] 47