Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2012.09 A Million Mousetraps: Using Big Data and Little Loops to Build Better Defenses

886 views

Published on

An examination of how behavioral analytics can be leveraged to design better defenses in complex user-facing platforms.

Published in: Technology
  • Be the first to comment

2012.09 A Million Mousetraps: Using Big Data and Little Loops to Build Better Defenses

  1. 1. A Million Mousetraps Using Big Data and Little Loops to Build Better Defenses Allison Miller
  2. 2. Overview Protecting customers on an open platform Big data + Little loops enable automation via analytics Decisions as defenses Putting your data to work
  3. 3. the interdependent system
  4. 4. the porous attack surface
  5. 5. so, about that perimeter... Spam ! ! Credential Theft Malware Bots Account takeover Fraud DOS Phishin Griefers Scammers
  6. 6. The Better Mousetrap Automates defensive action x-platform - Fast - Accurate - Cheap IN REAL TIME IN TIME TO MINIMIZE LOSS REASONABLE FALSE POSITIVES AS GOOD AS A HUMAN SPECIALIST REDUCES MORE LOSS THAN COST CREATED CHEAPER THAN MANUAL INTERVENTION BIG DATA & LITTLE LOOPS
  7. 7. BIG DATA & LITTLE LOOPS
  8. 8. 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/ 1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/ RTF" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! [Tue Mar 9 22:02:41 2004] [info] created shared memory segment #10813446! [Tue Mar 9 22:02:41 2004] [notice] Apache/1.3.29 (Unix) mod_ssl/2.8.16 OpenSSL/0.9.7c configured -- resuming normal operations! [Tue Mar 9 22:02:41 2004] [info] Server built: Mar 7 2004 13:38:59! pausing [http://xmlrevenue.com/s.php?username=jenneypan&keywords=Online +Gambling] for 50000 ms! [Tue Mar 9 22:04:16 2004] [error] [client 218.93.92.137] mod_security: Access denied with code 200. Pattern match "Basic" at HEADER.! [Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]! 123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/ 1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount? jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http:// www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! [Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)! [Tue Mar 9 22:03:26 2004] [error] [client 218.93.92.137] mod_security:! [Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]! 123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/ 1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount? jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http:// www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"! [Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem) BIG DATA & LITTLE LOOPS
  9. 9. BIG DATA & LITTLE LOOPS * Loop Disposition: Logic, Human, or Other?
  10. 10. APPLIED RISK ANALYTICS Use of technology, data, research & statistics to solve problems associated with losses or costs due to security vulnerabilities / gaps in a system -- resulting in the deployment of optimized detection, prevention, or response capabilities.
  11. 11. BRIEF TANGENT
  12. 12. WHAT IS THE DIFFERENCE BETWEEN RISK ANALYTICS AND RISK METRICS?
  13. 13. METRICS ANALYTICS
  14. 14. Such as... Metrics Analytics $ Loss Txns Purchase trends of high loss users # Compromised Accts IP Sources of bad login attempts % of Spam Messages Delivered Spam subject lines generating most clicks Minutes of downtime Most process-intensive applications # Customer Contacts Generated Highest-contact exception flows
  15. 15. YMMV
  16. 16. END TANGENT
  17. 17. Applied where? Where risks manifest in observable behavior Where system owners make decisions Where controls can be optimized by better recognizing identity, intent, or change
  18. 18. Decisions, Decisions Authorize Block Good false positive Bad false negative RESPONSE POPULATION Incorrect decisions have a cost Correct decisions are free (usually) Good Action Gets Blocked Bad Action Gets Through Downstream Impacts
  19. 19. BIG DATA & LITTLE LOOPS Why are you picking on me?Boo-yah! Still getting away with it. <Sigh> Nobody understands me.
  20. 20. Such as... Populations - Users, Transactions, Messages, Packets, API calls, Files Actions - Allow, Block, Challenge, Review, Retry, Quarantine, Add privileges, Upgrade privileges, Make Offer Costs - Fraud, Data leakage, Customer churn, Customer contacts, Downstream liability
  21. 21. Applying Decisions Risk management is decision management ACTOR ATTEMPTS ACTION SUBMIT WHAT IS THE REQUEST HOW TO HONOR THE REQUEST SHOULD WE HONOR? RESULT ACTION OCCURS
  22. 22. For example: ACTOR ATTEMPTS PAYMENT p (actor attempting payment is accountholder) Decision Authorize Review Refer Request Authentication Decline f(variable A + Variable B + ...) SUBMIT
  23. 23. Flavors of Risk Models I deviate significantly from a normal (good) pattern I summarize a known bad pattern fa(x), fb(x), fc(x) fq(x), fr(x), fs(x)
  24. 24. What is normal? http://en.wikipedia.org/wiki/Normal_distribution WHAT IS BAD? WHAT IS GOOD?
  25. 25. Study history... Who What Where When Why And then?
  26. 26. Study history... User IP Country <> Billing Country Buying prepaid mobile phones Add new shipping address in cart However Buyer = Phone reseller, static machine ID How much $$ is at risk? What is “normal” for this customer? What “bad” profiles does this match?
  27. 27. SHALL WE PLAY A GAME? (SINCE WE CAN’T PLAY “CLUE” FOR EVERY LOGIN TRANSACTION NEW USER MESSAGE FRIEND REQUEST ATTACHMENT PACKET WINK POKE CLICK WE BUILD RISK MODELS)
  28. 28. Model Development Process Target -> Yes/No questions best Find Data, Variable Creation -> Best part Data Prep -> Worst part Model Training -> Pick an algorithm Assessment -> Catch vs FP rate Deployment -> Decisioning vs Detection
  29. 29. User IP Country <> Billing Country Buying prepaid mobile phones Add new shipping address in cart Buyer = Phone reseller, static machine ID How much $$ is at risk? What is “normal” for this customer? What “bad” profiles does this match? GEOLOCATE IP CONVERT GEO TO COUNTRY CODE FLAG ON MISMATCH CART CATEGORY MERCH RISK LEVEL DATE ADDED ADDRESS TYPE STRING MATCHING CUSTOMER PROFILE DEVICE ID DEVICE HISTORYTXN-$-AMT CHURN RISK, CLV, TXNS, LOGINS, STOLEN CC,
  30. 30. Model Training Some algorithms: - Regression: Determines the best equation describe relationship between control variable and independent variables Linear Regression: Best equation is a line Logistic Regression: Best equation is a curve (exponential properties) - Bayesian: Used to estimate regression models, useful when working w/small data sets - Neural Nets: Can approximate any type of non-linear function, often highly predictive, but doesn’t explain the relationship between control and independent variables
  31. 31. LOGISTIC <DEPVAR> <VAR1> <VAR2>...
  32. 32. P-VALUE OF SIGNIFICANCE, THROW OUT IF > .05 VARIANCE IN DEPENDENT VARIABLE EXPLAINED BY INDEPENDENT VARIABLES DEPENDENT VARIABLE INDEPENDENT VARIABLES FACTOR ODDS OF DEPENDENT GO UP WHEN INDEPENDENT VAR INCREMENTED P-VALUE SHOULD BE < SIGNIFICANCE LEVEL (.05)
  33. 33. GAIN More gain/lift = more efficient predictions Catch as much as possible (as much of the “bads”) Minimize the overall affected
  34. 34. Target In the end, we only hit what we aim at
  35. 35. And now an example Everyone loves a good 419 scam
  36. 36. 419 example: the 411 Trigger - Contact receives 419 from a (free) business email account, who contacts victim OOB Backtrack - Password was changed (user had to go through reset process) - Contacts, inbox, outbox deleted - Nigerian IP login Elaboration - “Reply-to”: changed an “i” to an “l” (same ISP) - Only takes Western Union
  37. 37. 419 example: with love, from Abuja What is the question? - p(ATO) - p(Spam:scam) - p(Fake acct creation) What are our available answer/action sets? What else can we do to detect/mitigate?
  38. 38. 419 example: Reducing 911s Variables - “New” session variables: New login IP, new login IP country, new cookie/machine ID - “Change” account variables: Change password, change secondary email, change name, change public profile - “New” activity variables: Send to all contacts, # of accounts in “cc” or “bcc”, Edit/delete contacts en masse - Association variables: New recipients, New “reply-to” fields, “Similar” accounts created/associated (fuzzy=more difficult) User empowerment - Stronger password reset options (SMS) - Transparency: Other current sessions, past session history (IPs, logins) - Auto-logout all other sessions upon password reset - Reporting: Details of elaboration as well as cut and paste messages
  39. 39. Recap Protecting customers requires understanding not just technology but also behavior. This requires: - Activity data - Clear definitions of “good” vs “bad” results - Constant feedback - Analysis Designing data-driven defenses - Decisions that can be automated w/data - Where/what data sets to use - Business drivers to keep in mind An example BIG DATA & LITTLE LOOPS p (bad) f(variable A + Variable B + ...)
  40. 40. Prediction is very difficult, especially about the future Niels Bohr Allison Miller @selenakyle

×