Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
©2013LinkedInCorporation.AllRightsReserved.
1
Server-Side Second Factors!
Approaches to Measuring User Authenticity
David ...
©2016LinkedInCorporation.AllRightsReserved.
Imagine my job…
2
399,800,000+!
ARE NOT SECURITY EXPERTS
400,000,000+!
REGISTE...
©2013LinkedInCorporation.AllRightsReserved.
An Enigma attendee would never…
3
use a common password reuse passwords across...
©2016LinkedInCorporation.AllRightsReserved.
Why take over a LinkedIn account?
!
!
!
– Spam looks more legitimate when it c...
©2016LinkedInCorporation.AllRightsReserved.
Can we force better passwords?
5
§
DILBERT © 2005 Scott Adams. Used By permiss...
©2016LinkedInCorporation.AllRightsReserved.
Alternatives to better passwords?
6
©2016LinkedInCorporation.AllRightsReserved.
We must assume the worst
• The member’s password is weak or known.
• The accou...
©2016LinkedInCorporation.AllRightsReserved.
Scoring login attempts
• Assess level of suspiciousness.
• Require second fact...
– Prove you’re a human
– Establish contact through another channel
– Repeat back information you gave us earlier
©2016Link...
©2016LinkedInCorporation.AllRightsReserved.
Tradeoffs
Second factor needs to be easy for good users, hard for
bad guys.
– ...
©2016LinkedInCorporation.AllRightsReserved.
What data do we have to score logins?
– Request data:
– IP address (and derive...
Heuristics can get you pretty far:
©2016LinkedInCorporation.AllRightsReserved.
Which logins are suspicious?
12
BAD
Not so!...
©2016LinkedInCorporation.AllRightsReserved.
Effectiveness of heuristics
Good at stopping large-scale/indiscriminate attack...
©2016LinkedInCorporation.AllRightsReserved.
Formulating the problem statistically
14
Ultimately, model needs to decide whe...
©2016LinkedInCorporation.AllRightsReserved.
Formulating the problem statistically
15
Ultimately, model needs to decide whe...
©2013LinkedInCorporation.AllRightsReserved.
Computing the likelihood of attack
Asset Reputation Score
(interpreted as a pr...
Pr[attack|u, X]
Pr[legitimate|u, X]
= Pr[attack|X]↵
·
Pr[X]
Pr[X|u]
·
Pr[u|attack]
Pr[u]✏
©2013LinkedInCorporation.AllRigh...
©2016LinkedInCorporation.AllRightsReserved.
Smoothing
18
Q: How do we estimate when X is an IP address
that u has never lo...
©2016LinkedInCorporation.AllRightsReserved.
Experimental Results
Prototype model using two features: 

IP hierarchy & user...
• Protect all users.
• Minimize friction.
• Use both heuristics and machine learning.
• Use statistical models even if you...
©2013LinkedInCorporation.AllRightsReserved.
§
©2016 LinkedIn Corporation. All Rights Reserved.
21
Questions?
dfreeman@link...
Upcoming SlideShare
Loading in …5
×

Server-Side Second Factors: Approaches to Measuring User Authenticity

783 views

Published on

Passwords are used for user authentication by almost every Internet service today, despite a number of well-known weaknesses: passwords are often simple and easy to guess; they are re-used across sites; and they are susceptible to phishing. Numerous methods to replace or supplement passwords have been proposed, such as two-factor authentication or biometric authentication, but none has been adopted widely, leaving most accounts on most websites protected by a password only.

One approach to strengthening password-based authentication without changing user experience is to classify login attempts into *normal* and *suspicious* activity based on a number of parameters such as source IP, geolocation, browser configuration, time of day, and so on. For the suspicious attempts the service can then require additional verification, e.g., by an additional phone-based authentication step. Systems working along these principles have been deployed by many Internet services but have never been studied publicly.

In this work we propose a statistical framework for measuring the validity of a login attempt. We built a prototype implementation and tested on real login data from LinkedIn using only two features: IP address and browser's useragent. We find that we can achieve good accuracy using only *user login history* and *reputation systems*; in particular, a nascent service with no labeled account takeover data can still use our framework to protect its users. When combined with labeled data, our system can achieve even higher accuracy.

Published in: Internet
  • Be the first to comment

Server-Side Second Factors: Approaches to Measuring User Authenticity

  1. 1. ©2013LinkedInCorporation.AllRightsReserved. 1 Server-Side Second Factors! Approaches to Measuring User Authenticity David Freeman! Head of Anti-Abuse Engineering at LinkedIn! ! ! 
 Enigma 2016 San Francisco, CA 25 Jan 2016
  2. 2. ©2016LinkedInCorporation.AllRightsReserved. Imagine my job… 2 399,800,000+! ARE NOT SECURITY EXPERTS 400,000,000+! REGISTERED MEMBERS
  3. 3. ©2013LinkedInCorporation.AllRightsReserved. An Enigma attendee would never… 3 use a common password reuse passwords across sites
 (especially sites that get hacked) get phished tell someone their password …but some of the other 399.8 million might!
  4. 4. ©2016LinkedInCorporation.AllRightsReserved. Why take over a LinkedIn account? ! ! ! – Spam looks more legitimate when it comes from someone you know. ! ! ! – Accounts can have valuable contacts, messages, and other data. 4
  5. 5. ©2016LinkedInCorporation.AllRightsReserved. Can we force better passwords? 5 § DILBERT © 2005 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.
  6. 6. ©2016LinkedInCorporation.AllRightsReserved. Alternatives to better passwords? 6
  7. 7. ©2016LinkedInCorporation.AllRightsReserved. We must assume the worst • The member’s password is weak or known. • The account is not opted in to two-factor authentication. • The attacker could be a bot or human. • Members are not going to change their behavior. 7
  8. 8. ©2016LinkedInCorporation.AllRightsReserved. Scoring login attempts • Assess level of suspiciousness. • Require second factor above some threshold. • Cover all entry points.
  9. 9. – Prove you’re a human – Establish contact through another channel – Repeat back information you gave us earlier ©2016LinkedInCorporation.AllRightsReserved. What second factors could we require? 9
  10. 10. ©2016LinkedInCorporation.AllRightsReserved. Tradeoffs Second factor needs to be easy for good users, hard for bad guys. – Biggest gap: SMS verification – Smallest gap: “First name challenge” vs.
  11. 11. ©2016LinkedInCorporation.AllRightsReserved. What data do we have to score logins? – Request data: – IP address (and derived country, ISP, etc.) – Browser’s useragent (and OS, version, etc.) – Timestamp – Cookies – and more… – Reputation scores for all of the above – Global counters on all of the above – History of member’s previous (successful) logins 11
  12. 12. Heuristics can get you pretty far: ©2016LinkedInCorporation.AllRightsReserved. Which logins are suspicious? 12 BAD Not so! bad BAD BAD
  13. 13. ©2016LinkedInCorporation.AllRightsReserved. Effectiveness of heuristics Good at stopping large-scale/indiscriminate attacks. – Bot attack from Jan 2015: 
 99% blocked on country mismatch – Bot attack from Nov 2015: 
 98% matched country
 100% blocked by rate-limiting ! Not so good at stopping targeted attacks. – 96% of legitimate logins match country – 93% of compromises match country 13
  14. 14. ©2016LinkedInCorporation.AllRightsReserved. Formulating the problem statistically 14 Ultimately, model needs to decide whether ! ! [ Notation: X = user data (timestamp, IP address, browser, etc.) u = user identity ] ! Pr[attack|u, X] Pr[legitimate|u, X] > 1.
  15. 15. ©2016LinkedInCorporation.AllRightsReserved. Formulating the problem statistically 15 Ultimately, model needs to decide whether ! ! But it’s hard to estimate this ratio directly from the data! • Most members are never attacked (numerator is 0) • Only a few samples per member. • Members come from previously unseen values of X
 (IP addresses, browsers, etc.) Pr[attack|u, X] Pr[legitimate|u, X] > 1.
  16. 16. ©2013LinkedInCorporation.AllRightsReserved. Computing the likelihood of attack Asset Reputation Score (interpreted as a probability) Global likelihood of seeing data X Value of account 
 to attacker Appearance of data X in u’s (legitimate) login history Likelihood of member u logging in Pr[attack|u, X] Pr[legitimate|u, X] = Pr[attack|X] · Pr[X] Pr[X|u] · Pr[u|attack] Pr[u] After a few assumptions and a lot of Bayes’ rule, we get: No per-member attack data required!
  17. 17. Pr[attack|u, X] Pr[legitimate|u, X] = Pr[attack|X]↵ · Pr[X] Pr[X|u] · Pr[u|attack] Pr[u]✏ ©2013LinkedInCorporation.AllRightsReserved. Computing the likelihood of attack Asset Reputation Score (interpreted as a probability) Global likelihood of seeing data X Value of account 
 to attacker Appearance of data X in u’s (legitimate) login history Likelihood of member u logging in If you have labeled attack data, use it to learn feature weights. After a few assumptions and a lot of Bayes’ rule, we get:
  18. 18. ©2016LinkedInCorporation.AllRightsReserved. Smoothing 18 Q: How do we estimate when X is an IP address that u has never logged in from? ! A: We have auxiliary information about unseen IPs: ! ! ! ! • Use ISP- or country-level data to estimate probabilities. • Give higher weight to unseen events from a known ISP. § § § World USA AT&T 1.2.3.4 Pr[X|u]
  19. 19. ©2016LinkedInCorporation.AllRightsReserved. Experimental Results Prototype model using two features: 
 IP hierarchy & useragent hierarchy 
 (useragent, browser version, browser, OS) ! Test data: 
 (a) 6 months of compromised accounts
 (b) botnet observed in Jan 2015 ! ! ! 19 Results Country Match Model Result Model+Heuristics Botnet 99% 95% 99% Compromised accounts 7% 77% 81% False positives 4% 10% 3%
  20. 20. • Protect all users. • Minimize friction. • Use both heuristics and machine learning. • Use statistical models even if you don’t have good labeled data. ©2016LinkedInCorporation.AllRightsReserved. Take-aways 20
  21. 21. ©2013LinkedInCorporation.AllRightsReserved. § ©2016 LinkedIn Corporation. All Rights Reserved. 21 Questions? dfreeman@linkedin.com [p.s. we’re hiring too!] Login scoring model is joint work with Sakshi Jain (LinkedIn), Markus Dürmuth (Ruhr Universität Bochum), and Battista Biggio and Giorgio Giacinto (Università di Cagliari), to appear at NDSS ’16.

×