Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Want You to Know

905 views

Published on

Sponsored workshop at Black Hat USA 2017

https://www.blackhat.com/us-17/business-hall/schedule/#straight-talk-on-machine-learning----what-the-marketing-department-doesnt-want-you-to-know-8203

Published in: Technology
  • Be the first to comment

Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Want You to Know

  1. 1. STRAIGHT TALK ON MACHINE LEARNING WHAT THE MARKETING DEPARTMENT DOESN’T WANT YOU TO KNOW DR. SVEN KRASSER CHIEF SCIENTIST @SVENKRASSER
  2. 2. 2017 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. MACHINE LEARNING AT CROWDSTRIKE § ~50 billion events per day § ~800 thousand events per second peak § ~700 trillion bytes of sample data § Local decisions on endpoint and large scale analysis in cloud § Static and dynamic analysis techniques, various rich data sources § Analysts generating new ground truth 24/7
  3. 3. BRIEF ML PRIMER Height [mm] Weight[10-1kg] • What’s this? http://tinyurl.com/MLprimer • Two features • Two classes 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  4. 4. BETTER FEATURES “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  5. 5. MODEL FIT “Buttock Circumference” [mm] Weight[10-1kg] • Support Vector Machine • Real world: more features 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  6. 6. “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LET’S CLASSIFY
  7. 7. “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LET’S CLASSIFY
  8. 8. “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LET’S CLASSIFY
  9. 9. “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LET’S CLASSIFY
  10. 10. “Buttock Circumference” [mm] Weight[10-1kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LET’S CLASSIFY • Get more “blue” right (true positives) • Get more “red” wrong (false positives) 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  11. 11. RECEIVER OPERATING CHARACTERISTICS CURVE False Positive Rate TruePositiveRate Detect more by accepting more false positives 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  12. 12. MORE DIMENSIONS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • Some 160 dimensions • Projected back to 2- dimensional screen • Perfect separation
  13. 13. CURSE OF DIMENSIONALITY REDUCED predictive performance INCREASED training time SLOWER classification LARGER memory footprint
  14. 14. Source: https://commons.wikimedia.org/w/index.php?curid=2257082
  15. 15. Source: https://commons.wikimedia.org/w/index.php?curid=2257082
  16. 16. Height (mm) Weight[10-1kg] DIMENSIONALITY AND SPARSENESS
  17. 17. 2016 CrowdStrike, Inc. All rights reserved. Height (mm) Weight[10-1kg] DIMENSIONALITY AND SPARSENESS
  18. 18. 2017 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. ML IN INFOSEC APPLICATIONS § Not a single model solving everything § But many models working on the data in scope § Endpoint vs cloud § Fast response vs long observation § Lean vs resource intensive § Effectiveness vs interpretability § Avoid ML blinders § The guy in your store at 2am wielding a crowbar is not a customer
  19. 19. FILE ANALYSIS AKA Static Analysis • THE GOOD – Relatively fast to detect malware – Scalable – No need to detonate (“pre-execution”) – Platform independent, can be done at endpoint or cloud • THE BAD – Limited insight due to narrow view – Different file types require different techniques – Different subtypes need special consideration – Packed files – .Net – Installers – EXEs vs DLLs – Obfuscations (yet good if detectable) – Ineffective against exploitation and malware-less attacks – Asymmetry: a fraction of a second to decide for the defender, months to craft for the attacker 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  20. 20. VIRUSTOTAL INTEGRATION 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  21. 21. ENGINEERED FEATURES FOR EXECUTABLE FILES 32/64BIT EXECUTABLE SUBSYSTEM TYPE MACHINE INSTRUCTION DISTRIBUTION FILESIZE TIMESTAMP DEBUG INFORMATION PRESENT PACKERTYPE FILEENTROPY NUMBEROF SECTIONS NUMBER WRITABLE SECTIONS NUMBER READABLE SECTIONS NUMBER EXECUTABLE SECTIONS DISTRIBUTION OFSECTION ENTROPY IMPORTED DLLNAMES IMPORTED FUNCTION NAMES COMPILER ARTIFACTS LINKER ARTIFACTS RESOURCE DATA PROTOCOL STRINGS IPS/DOMAINS PATHS PRODUCT METADATA DIGITAL SIGNATURE ICON CONTENT … 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  22. 22. ENGINEERED FEATURES FOR EXECUTABLE FILES 32/64BIT EXECUTABLE SUBSYSTEM TYPE MACHINE INSTRUCTION DISTRIBUTION FILESIZE TIMESTAMP DEBUG INFORMATION PRESENT PACKERTYPE FILEENTROPY NUMBEROF SECTIONS NUMBER WRITABLE SECTIONS NUMBER READABLE SECTIONS NUMBER EXECUTABLE SECTIONS DISTRIBUTION OFSECTION ENTROPY IMPORTED DLLNAMES IMPORTED FUNCTION NAMES COMPILER ARTIFACTS LINKER ARTIFACTS RESOURCE DATA PROTOCOL STRINGS IPS/DOMAINS PATHS PRODUCT METADATA DIGITAL SIGNATURE ICON CONTENT … 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • Continuous features • Categorical features • n-hot encoding • Embedding
  23. 23. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LEARNED FEATURES • Unstructured file content • Algorithm uncovers interesting properties • Requires a lot more more input data • Unlocks more insight
  24. 24. String-based feature Executablesectionsize-based feature 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. COMBINING FEATURES
  25. 25. Subspace Projection A SubspaceProjectionB 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. COMBINING FEATURES
  26. 26. CLASSIFICATION PERFORMANCE
  27. 27. 99%DETECTIONRATE 1%FALSEPOSITIVES Malware?
  28. 28. Malware 99%DETECTIONRATE 1%FALSEPOSITIVES
  29. 29. 99%DETECTIONRATE 1%FALSEPOSITIVES Not Malware
  30. 30. 99% TRUE POSITIVE RATE 31 Chanceofatleastone successforadversary Number of attempts 1% >99.3% 500
  31. 31. 2017 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. Training set distribution generally differs from… DIFFERENCE IN DISTRIBUTIONS § Real-world distribution (customer networks) § Evaluations (what customers test) § Testing houses (various 3rd party testers with varying methodologies) § Community resources (e.g. user submissions to CrowdStrike scanner on VirusTotal)
  32. 32. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. FIELD DISTRIBUTIONClean Malware Type A Malware Type B
  33. 33. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. FIELD DISTRIBUTION
  34. 34. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. TEST DISTRIBUTION
  35. 35. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. COMBINED DATA • More accurate • Fares better on test data • But: not as good on field data
  36. 36. MALWARE 40% THREAT SOPHISTICATION MALWARE STOPPING MALWARE IS NOT ENOUGH HARDERTOPREVENT &DETECT LOW HIGH HIGH LOW 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  37. 37. THREAT SOPHISTICATION MALWARE NON-MALWARE ATTACKS MALWARE 40% NATION- STATES 60% NON-MALWARE ATTACKS ORGANIZED CRIMINAL GANGS HACKTIVISTS/ VIGILANTES TERRORISTS CYBER- CRIMINALS YOU NEED COMPLETE BREACH PREVENTION HARDERTOPREVENT &DETECT LOW HIGH HIGH LOW 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  38. 38. Next-Generation Endpoint Protection Cloud Delivered. Enriched by Threat Intelligence MANAGED HUNTING ENDPOINT DETECTION AND RESPONSE NEXT-GEN ANTIVIRUS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  39. 39. KEY POINTS • Machine Learning is an effective tool against unknown threats • Trading off true positives and false positives • Features matter, but don’t count them • One of many uses is static analysis • Detecting 99% malware means an APT has a 100% chance of getting malware into your environment • For ML, distributions matter • The majority of intrusions are not malware- based • Avoid silent failure • Use a comprehensive array of techniques 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

×