SlideShare a Scribd company logo
1 of 39
Download to read offline
Local Password validation using Self- 
Organizing Maps 
! 
ESORICS 2014 
! 
Diogo Monica and Carlos Ribeiro (diogo.monica, carlos.ribeiro)@tecnico.ulisboa.pt )
Talk Outline 
‣ Background and Motivation 
‣ Our approach 
‣ Performance 
‣ Conclusions
Background and 
Motivation
Background 
Passwords are still the #1 way of doing authentication 
‣ Leaks have shown that: 
‣ Users are bad at choosing passwords 
‣ There is prevalent password re-use across services
Background 
Password validation heuristics are not working 
‣ Leaks have shown that: 
‣ promote weak variations that computers are good at 
guessing 
‣ positive reinforcement of bad decisions (password 
strength meters)
Background 
Password storage is evolving 
‣ The prevalence of bcrypt, scrypt and variations have 
made brute-force hard 
‣ Time-memory trade off (TMTO) resistance 
‣ GPU unfriendliness (rapid random reads) 
‣ Targeted (dictionary) attacks becoming the norm 
‣ Some of them based on the knowledge of what the 
most common passwords are
Background 
‣ Password managers are still not prevalent 
‣ There will always be passwords that people need to 
chose and memorize (laptop, offline access, password 
manager’s password, etc)
Motivation 
‣ Give better password strength feedback 
‣ Remove unnecessary jumps and hoops (strict rules) 
‣ Promote the use of passwords that are easy to 
remember but hard to guess
Password Frequency
Core idea 
‣ Use the frequency of past password appearance to 
hamper the effectiveness of statistical dictionary 
attacks 
‣ Some organizations already use basic versions of this 
method 
‣ Twitter prohibits the use of the 370 most common 
passwords 
http://techcrunch.com/2009/12/27/twitter-banned-passwords/
Why is it hard? 
‣ Lack of access to representative data-sets 
‣ Computationally expensive 
‣ Offline access (local validation) 
‣ Efficient client distribution (updates) 
‣ Compression 
‣ Potential leak of candidate passwords
Dealing with 
variations 
password 
1111111 
aaaaa 
qwerty 
p@ssw0rd 
password1 
passw0rd 
p4ssw0rd 
... 
2222222 
3333333 
4444444 
5555555 
... 
asdfg 
zxcvb 
qw3rty 
wertyu 
... 
... 
bbbbb 
ccccc 
sssss 
ddddd 
... 
Frequent 
Passwords 
List
Our goal 
Design a popularity based classification scheme that: 
‣ Resists common password variations (generalization 
capability) 
‣ Allows for offline operation (no centralized authority) 
‣ Is easily distributable through end user’s systems 
‣ Total size must be small 
‣ Should not compromise password security 
‣ Testing candidate passwords should be easy and 
inexpensive 
‣ Time, cpu, memory
Our approach
Server-side: 
• Compression 
• Generalization 
• Hashing 
Client-side: 
• Classification 
Server-side User-side 
password 
1111111 
aaaa 
123456 
Password 
List 
Generalization 
Classification 
Database 
Compression 
Hashing 
download 
Candidate 
Passwords 
password 
1111111 
aaaa 
123456 
Classification 
Our approach
Compression
Requirements 
‣ Compress database to allow distribution (e.g. mobile 
phones) 
‣ Compression should not destroy topological proximity 
of the passwords
Self-Organizing 
Maps 
‣ Clustering tool 
‣ Unsupervised neural network 
‣ Reflects in the output space topological proximity 
relations in the input space 
‣ It provides us with an easy way of doing password 
“generalization”
Self-Organizing Maps 
Training process 
High-level Process: 
• Determine which node has a model closer to the input password 
(BMU) 
• Find the set of all nodes in the lattice neighborhood of the BMU 
• Update the models of all nodes in the lattice neighborhood of the 
BMU, to make them approximate the input password. 
Output: 
The resulting map is a summary replica of the input space, with a 
much lower number of elements but maintaining its topological 
relations. 
Concrete details of the training procedure can be found in the paper
Self-Organizing Maps 
Compression Ratio 
‣ Compression ration is 
‣ is the number of input passwords 
‣ is the number of nodes in the SOM. 
‣ Important to node that for any chosen 
compression ratio. 
pmiss is the probability of wrongly classifying a password whose occurrence is higher than the threshold as safe
Self-Organizing Maps 
Similarity Measure 
‣ Defines the topological characteristics of the input 
space to be preserved in the output space 
Euclidean distance Hamming distance 
between ASCII codes 
Beta pulls the overall measure of dissimilarity towards a simple human-related overlap distance
Classification 
‣ Chose a popularity threshold 
‣ Determine the BMU of the candidate password 
‣ If the popularity level of the BMU is above the threshold 
the password is rejected.
Generalization
Generalization 
‣ At this point we have a map that can be used for 
password classification 
‣ Similar passwords are adjacent to each other 
‣ By imposing popularity leakage from local maxima to 
neighboring nodes, we increase the generalization 
capability of the network.
Generalization 
Non-linear low pass filtering of the popularity levels 
#$%" 
!" 
#$%" 
#$%" 
#$%" #$%" 
#$%" 
#$%" 
#$%" 
!" 
#" 
$%&!'#(" 
!" 
#" 
!" 
!" 
!" !" 
!" 
!" 
!" 
!" 
#" 
$)&!'#(" 
Smoothing kernel 
Popularity label of node (x,y) 
We apply: 
With:
Hashing
Hashing 
‣ The models in the SOM are a compressed summary of 
the training passwords 
‣ Security problem 
‣ We will need to find a hash that ensures non-invertibility 
of models 
‣ Can’t destroy the topological proximity
Hashing 
‣ Locality Preserving Hashes 
‣ Deterministically invertible 
‣ Cryptographic Hashes 
‣ Destroy topological proximity by definition
Hashing 
Discrete Fourier Transform 
‣ A linear vector projection that 
we understand 
‣ We can remove the phase 
information, thus avoiding 
invertibility 
Password Spectrum (complex) 
discard phase 
Autocorrelation Spectrum (magnitude) 
‣ The power spectrum of a password 
is always closer to itself than to the 
power spectrum of a different 
password
Performance
CM-Sketch 
‣ Oracle to identify undesirably popular passwords 
‣ Uses a count-min sketch 
‣ No false negatives ( ) 
‣ Centralized operation 
‣ No generalization features 
http://research.microsoft.com/pubs/132859/popularityiseverything.pdf
Performance 
Compression rate and statistical performance 
0 1 2 3 4 5 6 7 8 9 10 
4 
x 10 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
SOM vs CM Sketch 
Popularity ranking order of chosen threshold 
p 
FA 
SOM (140,200) 
CMSkectch(3,28000) 
CMSkectch(3,56000) 
CMSkectch(3,84000) 
CMSkectch(3,112000) 
‣ CM-sketch false positives are 
pure statistical classification 
errors 
‣ SOM false positives may result 
from the desirable 
generalization properties 
~672Kb 
~834Kb
Performance 
Generalization capability 
German Dane Dutch English Spanish Italian Latin 
0.4 
0.35 
0.3 
0.25 
0.2 
0.15 
0.1 
0.05 
0 
Language 
Percentage 
Passwords 
Flagged as 
Dangerous 
500 most 
probable 
100% 
All mutations 84% 
Random 
passwords 
11.3% 
John the Ripper 
mutations testing 
Testing different 
language dictionaries 
‣ Is the SOM generalizing in a 
useful way? 
‣ Is the SOM generalizing too 
much?
Implementation
Conclusions 
‣ Presented a scheme for password validation 
‣ Envisaged for local, decentralized operation 
‣ Possesses generalization capabilities 
‣ No claims of optimality were made, but it was shown 
that our approach is feasible 
‣ The solution was implemented and tested empirically
Thank you 
Diogo Monica (@diogomonica)
Why didn’t you do 
the DFT at the 
beginning? 
‣ Our inability to come up with a similarity measure that 
has human meaning when working with hashes 
‣ Having the hashes at the beginning would make the 
training computationally more expensive 
‣ By doing them at the end, the hashing function can be 
improved without changing anything in the SOM 
training 
‣ We verified via Monte Carlo simulations that there is 
practically no loss in terms of topological proximity 
when doing the hashes at the end, making this a non-issue
Why did you chose 
this similarity 
measure? 
‣ We are able to easily calculate distances between 
models and input passwords 
‣ We have the ability of doing fractional approximation 
‣ Our similarity measure captures a human “closeness” 
criteria
Why did you chose 
DFTs 
‣ They are very fast to compute 
‣ They are informationally non-invertible (if we discard 
the phase component) 
‣ They maintained the topological proximity of our SOM 
‣ Something that we can reason about because we 
understand what it means 
‣ The only hashing mechanism that we found that works

More Related Content

What's hot

What's hot (8)

Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17
 
Defcon 27 - Writing custom backdoor payloads with C#
Defcon 27 - Writing custom backdoor payloads with C#Defcon 27 - Writing custom backdoor payloads with C#
Defcon 27 - Writing custom backdoor payloads with C#
 
Hardening Three - IDS/IPS Technologies
Hardening Three - IDS/IPS TechnologiesHardening Three - IDS/IPS Technologies
Hardening Three - IDS/IPS Technologies
 
CheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted MalwareCheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted Malware
 
Bsides Baltimore 2019: Embrace the Red Enhancing detection capabilities with ...
Bsides Baltimore 2019: Embrace the Red Enhancing detection capabilities with ...Bsides Baltimore 2019: Embrace the Red Enhancing detection capabilities with ...
Bsides Baltimore 2019: Embrace the Red Enhancing detection capabilities with ...
 
Hta r33
Hta r33Hta r33
Hta r33
 
Bsides Long Island 2019
Bsides Long Island 2019Bsides Long Island 2019
Bsides Long Island 2019
 
Wireless security beyond password cracking by Mohit Ranjan
Wireless security beyond password cracking by Mohit RanjanWireless security beyond password cracking by Mohit Ranjan
Wireless security beyond password cracking by Mohit Ranjan
 

Viewers also liked

On the use of radio resource tests in wireless ad hoc networks
On the use of radio resource tests in wireless ad hoc networksOn the use of radio resource tests in wireless ad hoc networks
On the use of radio resource tests in wireless ad hoc networks
Diogo Mónica
 

Viewers also liked (20)

MultiPath TCP - The path to multipath
MultiPath TCP - The path to multipathMultiPath TCP - The path to multipath
MultiPath TCP - The path to multipath
 
Observable Non-Sybil Quorums Construction in One-Hop Wireless Ad Hoc Networks
Observable Non-Sybil Quorums Construction in One-Hop Wireless Ad Hoc NetworksObservable Non-Sybil Quorums Construction in One-Hop Wireless Ad Hoc Networks
Observable Non-Sybil Quorums Construction in One-Hop Wireless Ad Hoc Networks
 
MTLS in a Microservices World
MTLS in a Microservices WorldMTLS in a Microservices World
MTLS in a Microservices World
 
From 0 to 0xdeadbeef - security mistakes that will haunt your startup
From 0 to 0xdeadbeef - security mistakes that will haunt your startupFrom 0 to 0xdeadbeef - security mistakes that will haunt your startup
From 0 to 0xdeadbeef - security mistakes that will haunt your startup
 
WiFiHop - mitigating the Evil twin attack through multi-hop detection
WiFiHop - mitigating the Evil twin attack through multi-hop detectionWiFiHop - mitigating the Evil twin attack through multi-hop detection
WiFiHop - mitigating the Evil twin attack through multi-hop detection
 
Bletchley
BletchleyBletchley
Bletchley
 
Secure Software Distribution in an Adversarial World
Secure Software Distribution in an Adversarial WorldSecure Software Distribution in an Adversarial World
Secure Software Distribution in an Adversarial World
 
On the use of radio resource tests in wireless ad hoc networks
On the use of radio resource tests in wireless ad hoc networksOn the use of radio resource tests in wireless ad hoc networks
On the use of radio resource tests in wireless ad hoc networks
 
Application Security from the Inside - OWASP
Application Security from the Inside - OWASPApplication Security from the Inside - OWASP
Application Security from the Inside - OWASP
 
Docker presentation | Paris Docker Meetup
Docker presentation | Paris Docker MeetupDocker presentation | Paris Docker Meetup
Docker presentation | Paris Docker Meetup
 
Étudier les usages du web social
Étudier les usages du web social Étudier les usages du web social
Étudier les usages du web social
 
Web Summit 2015 - Enterprise stage - Cloud, Open-Source, Security
Web Summit 2015 - Enterprise stage - Cloud, Open-Source, SecurityWeb Summit 2015 - Enterprise stage - Cloud, Open-Source, Security
Web Summit 2015 - Enterprise stage - Cloud, Open-Source, Security
 
Toward Better Password Requirements
Toward Better Password RequirementsToward Better Password Requirements
Toward Better Password Requirements
 
Tune your App Perf (and get fit for summer)
Tune your App Perf (and get fit for summer)Tune your App Perf (and get fit for summer)
Tune your App Perf (and get fit for summer)
 
Estructuras de control selectivas
Estructuras de control selectivasEstructuras de control selectivas
Estructuras de control selectivas
 
NoSQL Injections in Node.js - The case of MongoDB
NoSQL Injections in Node.js - The case of MongoDBNoSQL Injections in Node.js - The case of MongoDB
NoSQL Injections in Node.js - The case of MongoDB
 
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya MorimotoSQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 

Similar to ESORICS 2014: Local Password validation using Self-Organizing Maps

Secure computing for java and dot net
Secure computing for java and dot netSecure computing for java and dot net
Secure computing for java and dot net
redpel dot com
 
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
Thiện Dương
 

Similar to ESORICS 2014: Local Password validation using Self-Organizing Maps (20)

Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)
 
Synopsis
SynopsisSynopsis
Synopsis
 
Secure computing for java and dot net
Secure computing for java and dot netSecure computing for java and dot net
Secure computing for java and dot net
 
Highly dependable automotive software
Highly dependable automotive softwareHighly dependable automotive software
Highly dependable automotive software
 
Good++
Good++Good++
Good++
 
A presentation on graphical passwords
A presentation on graphical passwordsA presentation on graphical passwords
A presentation on graphical passwords
 
A Whirlwind Tour of Spatial Joins
A Whirlwind Tour of Spatial JoinsA Whirlwind Tour of Spatial Joins
A Whirlwind Tour of Spatial Joins
 
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
The Good, The Bad, and The Avro (Graham Stirling, Saxo Bank and David Navalho...
 
Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...
Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...
Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 ...
 
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
Owaspeutour2013lisbon pedrofortunaprotectingjavascriptsourcecodeusingobfuscat...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Solve cross cutting concerns with aspect oriented programming (aop)
Solve cross cutting concerns with aspect oriented programming (aop)Solve cross cutting concerns with aspect oriented programming (aop)
Solve cross cutting concerns with aspect oriented programming (aop)
 
finale.ppt.pptx
finale.ppt.pptxfinale.ppt.pptx
finale.ppt.pptx
 
Reversing malware analysis training part11 exploit development advanced
Reversing malware analysis training part11 exploit development advancedReversing malware analysis training part11 exploit development advanced
Reversing malware analysis training part11 exploit development advanced
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016So You Just Inherited a $Legacy Application… NomadPHP July 2016
So You Just Inherited a $Legacy Application… NomadPHP July 2016
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

ESORICS 2014: Local Password validation using Self-Organizing Maps

  • 1. Local Password validation using Self- Organizing Maps ! ESORICS 2014 ! Diogo Monica and Carlos Ribeiro (diogo.monica, carlos.ribeiro)@tecnico.ulisboa.pt )
  • 2. Talk Outline ‣ Background and Motivation ‣ Our approach ‣ Performance ‣ Conclusions
  • 4. Background Passwords are still the #1 way of doing authentication ‣ Leaks have shown that: ‣ Users are bad at choosing passwords ‣ There is prevalent password re-use across services
  • 5. Background Password validation heuristics are not working ‣ Leaks have shown that: ‣ promote weak variations that computers are good at guessing ‣ positive reinforcement of bad decisions (password strength meters)
  • 6. Background Password storage is evolving ‣ The prevalence of bcrypt, scrypt and variations have made brute-force hard ‣ Time-memory trade off (TMTO) resistance ‣ GPU unfriendliness (rapid random reads) ‣ Targeted (dictionary) attacks becoming the norm ‣ Some of them based on the knowledge of what the most common passwords are
  • 7. Background ‣ Password managers are still not prevalent ‣ There will always be passwords that people need to chose and memorize (laptop, offline access, password manager’s password, etc)
  • 8. Motivation ‣ Give better password strength feedback ‣ Remove unnecessary jumps and hoops (strict rules) ‣ Promote the use of passwords that are easy to remember but hard to guess
  • 10. Core idea ‣ Use the frequency of past password appearance to hamper the effectiveness of statistical dictionary attacks ‣ Some organizations already use basic versions of this method ‣ Twitter prohibits the use of the 370 most common passwords http://techcrunch.com/2009/12/27/twitter-banned-passwords/
  • 11. Why is it hard? ‣ Lack of access to representative data-sets ‣ Computationally expensive ‣ Offline access (local validation) ‣ Efficient client distribution (updates) ‣ Compression ‣ Potential leak of candidate passwords
  • 12. Dealing with variations password 1111111 aaaaa qwerty p@ssw0rd password1 passw0rd p4ssw0rd ... 2222222 3333333 4444444 5555555 ... asdfg zxcvb qw3rty wertyu ... ... bbbbb ccccc sssss ddddd ... Frequent Passwords List
  • 13. Our goal Design a popularity based classification scheme that: ‣ Resists common password variations (generalization capability) ‣ Allows for offline operation (no centralized authority) ‣ Is easily distributable through end user’s systems ‣ Total size must be small ‣ Should not compromise password security ‣ Testing candidate passwords should be easy and inexpensive ‣ Time, cpu, memory
  • 15. Server-side: • Compression • Generalization • Hashing Client-side: • Classification Server-side User-side password 1111111 aaaa 123456 Password List Generalization Classification Database Compression Hashing download Candidate Passwords password 1111111 aaaa 123456 Classification Our approach
  • 17. Requirements ‣ Compress database to allow distribution (e.g. mobile phones) ‣ Compression should not destroy topological proximity of the passwords
  • 18. Self-Organizing Maps ‣ Clustering tool ‣ Unsupervised neural network ‣ Reflects in the output space topological proximity relations in the input space ‣ It provides us with an easy way of doing password “generalization”
  • 19. Self-Organizing Maps Training process High-level Process: • Determine which node has a model closer to the input password (BMU) • Find the set of all nodes in the lattice neighborhood of the BMU • Update the models of all nodes in the lattice neighborhood of the BMU, to make them approximate the input password. Output: The resulting map is a summary replica of the input space, with a much lower number of elements but maintaining its topological relations. Concrete details of the training procedure can be found in the paper
  • 20. Self-Organizing Maps Compression Ratio ‣ Compression ration is ‣ is the number of input passwords ‣ is the number of nodes in the SOM. ‣ Important to node that for any chosen compression ratio. pmiss is the probability of wrongly classifying a password whose occurrence is higher than the threshold as safe
  • 21. Self-Organizing Maps Similarity Measure ‣ Defines the topological characteristics of the input space to be preserved in the output space Euclidean distance Hamming distance between ASCII codes Beta pulls the overall measure of dissimilarity towards a simple human-related overlap distance
  • 22. Classification ‣ Chose a popularity threshold ‣ Determine the BMU of the candidate password ‣ If the popularity level of the BMU is above the threshold the password is rejected.
  • 24. Generalization ‣ At this point we have a map that can be used for password classification ‣ Similar passwords are adjacent to each other ‣ By imposing popularity leakage from local maxima to neighboring nodes, we increase the generalization capability of the network.
  • 25. Generalization Non-linear low pass filtering of the popularity levels #$%" !" #$%" #$%" #$%" #$%" #$%" #$%" #$%" !" #" $%&!'#(" !" #" !" !" !" !" !" !" !" !" #" $)&!'#(" Smoothing kernel Popularity label of node (x,y) We apply: With:
  • 27. Hashing ‣ The models in the SOM are a compressed summary of the training passwords ‣ Security problem ‣ We will need to find a hash that ensures non-invertibility of models ‣ Can’t destroy the topological proximity
  • 28. Hashing ‣ Locality Preserving Hashes ‣ Deterministically invertible ‣ Cryptographic Hashes ‣ Destroy topological proximity by definition
  • 29. Hashing Discrete Fourier Transform ‣ A linear vector projection that we understand ‣ We can remove the phase information, thus avoiding invertibility Password Spectrum (complex) discard phase Autocorrelation Spectrum (magnitude) ‣ The power spectrum of a password is always closer to itself than to the power spectrum of a different password
  • 31. CM-Sketch ‣ Oracle to identify undesirably popular passwords ‣ Uses a count-min sketch ‣ No false negatives ( ) ‣ Centralized operation ‣ No generalization features http://research.microsoft.com/pubs/132859/popularityiseverything.pdf
  • 32. Performance Compression rate and statistical performance 0 1 2 3 4 5 6 7 8 9 10 4 x 10 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 SOM vs CM Sketch Popularity ranking order of chosen threshold p FA SOM (140,200) CMSkectch(3,28000) CMSkectch(3,56000) CMSkectch(3,84000) CMSkectch(3,112000) ‣ CM-sketch false positives are pure statistical classification errors ‣ SOM false positives may result from the desirable generalization properties ~672Kb ~834Kb
  • 33. Performance Generalization capability German Dane Dutch English Spanish Italian Latin 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Language Percentage Passwords Flagged as Dangerous 500 most probable 100% All mutations 84% Random passwords 11.3% John the Ripper mutations testing Testing different language dictionaries ‣ Is the SOM generalizing in a useful way? ‣ Is the SOM generalizing too much?
  • 35. Conclusions ‣ Presented a scheme for password validation ‣ Envisaged for local, decentralized operation ‣ Possesses generalization capabilities ‣ No claims of optimality were made, but it was shown that our approach is feasible ‣ The solution was implemented and tested empirically
  • 36. Thank you Diogo Monica (@diogomonica)
  • 37. Why didn’t you do the DFT at the beginning? ‣ Our inability to come up with a similarity measure that has human meaning when working with hashes ‣ Having the hashes at the beginning would make the training computationally more expensive ‣ By doing them at the end, the hashing function can be improved without changing anything in the SOM training ‣ We verified via Monte Carlo simulations that there is practically no loss in terms of topological proximity when doing the hashes at the end, making this a non-issue
  • 38. Why did you chose this similarity measure? ‣ We are able to easily calculate distances between models and input passwords ‣ We have the ability of doing fractional approximation ‣ Our similarity measure captures a human “closeness” criteria
  • 39. Why did you chose DFTs ‣ They are very fast to compute ‣ They are informationally non-invertible (if we discard the phase component) ‣ They maintained the topological proximity of our SOM ‣ Something that we can reason about because we understand what it means ‣ The only hashing mechanism that we found that works