Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)


Published on

Follow along with the R Markdown file at

Source code available at:

Full Abstract:

Threat Intelligence feeds are now being touted as the saving grace for SIEM and log management deployments, and as a way to supercharge incident detection and even response practices. We have heard similar promises before as an industry, so it is only fair to try to investigate. Since the actual number of breaches and attacks worldwide is unknown, it is impossible to measure how good threat intelligence feeds really are, right? Enter a new scientific breakthrough developed over the last 300 years: statistics!

This presentation will consist of a data-driven analysis of a cross-section of threat intelligence feeds (both open-source and commercial) to measure their statistical bias, overlap, and representability of the unknown population of breaches worldwide. Are they a statistical good measure of the population of "bad stuff" happening out there? Is there even such a thing? How tuned to your specific threat surface are those feeds anyway? Regardless, can we actually make good use of them even if the threats they describe have no overlap with the actual incidents you have been seeing in your environment?

We will provide an open-source tool for attendees to extract, normalize and export data from threat intelligence feeds to use in their internal projects and systems. It will be pre-configured with current OSINT network feed and easily extensible for private or commercial feeds. All the statistical code written and research data used (from the open-source feeds) will be made available in the spirit of reproducible research. The tool itself will be able to be used by attendees to perform the same type of tests on their own data.

Join Alex and Kyle on a journey through the actual real-world usability of threat intelligence to find out which mix of open source and private feeds are right for your organization.

Published in: Data & Analytics
  • Was a little hesitant about using ⇒⇒⇒ ⇐⇐⇐ at first, but am very happy that I did. The writer was able to write my paper by the deadline and it was very well written. So guys don’t hesitate to use it.
    Are you sure you want to  Yes  No
    Your message goes here
  • I also like the site. They helped me
    Are you sure you want to  Yes  No
    Your message goes here
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ ◀ ◀ ◀ ◀
    Are you sure you want to  Yes  No
    Your message goes here
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ ◀ ◀ ◀ ◀
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I can recommend a site that has helped me. It's called ⇒ ⇐ They helped me for writing my quality research paper.
    Are you sure you want to  Yes  No
    Your message goes here

Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)

  1. 1. Measuring  the  IQ  of  your  Threat   Intelligence  Feeds  (#TIQtest)   Alex  Pinto   MLSec  Project     @alexcpsec   @MLSecProject! Kyle  Maxwell   Researcher   @kylemaxwell!
  2. 2. Alex  Pinto   •  Science  guy  at  MLSec  Project   •  ML  trainer   •  Network  security  aficionado     •  Tortured  by  SIEMs  as  a  child   •  Hacker  Spirit  Animal™:   CAFFEINATED  CAPYBARA! whoami(s)   Kyle  Maxwell   •  Researcher  at  [REDACTED]   •  Math  Smuggler   •  Recovering  Incident  Responder   •  GPL  zealot   •  Hacker  Spirit  Animal™:   AXIOMATIC  ARMADILLO! (hUps://   (hUp:// gallery.php?tag=mammals&name=armadillo)  
  3. 3. •  Threat  Intel  102   •  Measuring  Intelligence   •  Data  Preparaaon   •  Tesang  the  Data   •  Tools:   •  COMBINE   •  TIQ-­‐TEST   •  Some  parang  ideas   Agenda   (hUp://­‐test.html)  
  4. 4. Threat  Intel  102:  Capability  and   Intent   •  What  are  they  able  to  do?   •  What  are  they  intending  to  do?!
  5. 5. Threat  Intel  102:  Cage  Matches   •  Signatures  vs  Indicators   •  Data  vs  Intelligence   •  Tacacal  vs  Strategic   •  Atomic  vs  Composite  
  6. 6. Threat  Intel  102:  Pyramid  of  Pain   “Simple”  and  “easy”  aren’t  always   (David  Bianco  –  Pyramid  of  Pain)  
  7. 7. What  about  IP  addresses?   •  Approximately  same  value  as  hostnames  (APT  vs  DGA)   •  Finite  resource  (unal  IPv6,  that  is)   •  Managed  /  controlled  by  orgs   •  Difficulty  /  economic  incenaves  /  implied  “cost”   •  Also,  recyclable     (hUps://  
  8. 8. Given  IP  addresses  harvested  from   TI  feeds,  can  we  measure  how   much  they  “help”  our  defense   metrics?!
  9. 9. Introducing  TIQ-­‐TEST   •  All  these  tests  are  available  as  R  funcaons  at     •  hUps://­‐test   •  Have  fun,  prove  me  wrong,  suggest  stuff   •  Tools  that  implement  those  tests   •  Sample  data  +  R  Markdown  file   •  The  excuse  to  learn  a  staasacal  language  you  were  waiang   for!  
  10. 10. Data  Sources  –  Types  of  data   •  Extract  the  “raw”  informaaon  from  indicator  feeds   •  Both  IP  addresses  and  hostnames  were  extracted  
  11. 11. Data  Sources  –  Feeds  Selected   •  Data  was  separated  into  “inbound”  and  “outbound”  
  12. 12. Data  PreparaLon  and  Cleansing   •  Convert  the  hostname  data  to  IP  addresses:   •  Acave  IP  addresses  for  the  respecave  date  (“A”  query)   •  Passive  DNS  from  Farsight  Security  (DNSDB)   •  We  removed  non-­‐public  IPs  from  the  dataset  (RFC1918)     •  Yeah,  we  know  it  is  a  “parking  technique”   (hUps://  
  13. 13. Data  PreparaLon  and  Cleansing   •  For  each  IP  record  (including  the  ones  from  hostnames):   •  Add  asnumber  and  asname  (from  MaxMind  ASN  DB)   •  Add  country  (from  MaxMind  GeoLite  DB)   •  Add  rhost  (again  from  DNSDB)  –  most  popular  “PTR”   •  The  experiments  will  be  around  ASNs  and  Geolocaaon  
  14. 14. Data  PreparaLon  and  Cleansing   •  However,  we  will  NOT  be  using  maps.  Just  let  it  go.  
  15. 15. Data  PreparaLon  and  Cleansing   •  Small  enriched  sample:!
  16. 16. TesLng  the  Data   •  Let’s  generate  some  interesang  metrics:   •  NOVELTY  –  How  oxen  do  they  update  themselves?   •  OVERLAP  –  How  do  they  compare  to  what  you  got?   •  POPULATION  –  what  is  in  them  anyway?   •  Populaaon  is  tricky:   •  Could  mean  the  enare  world  (all  IPv4  space)   •  Should  ideally  mean  YOUR  world  
  17. 17. But  WHAT  IS  THE  IQ?!??1?   •  We  will  withhold  judgment   •  The  best  data  composiaon  is  the  best  one  for  you   •  We  will  do  our  best  to  explain  results  so  you  can  decide.   •  Maybe  on  further  (or  more  private)  research…      
  18. 18. Novelty  Test  –  measuring  added   and  dropped  indicators!
  19. 19. Novelty  Test  (1)  
  20. 20. Novelty  Test  (2)  
  21. 21. Overlap  Test  –  More  data  is  beUer,   but  make  sure  it  is  not  the  same   data!
  22. 22. Overlap  Test  
  23. 23. Overlap  Test  
  24. 24. Overlap  Test  
  25. 25. PopulaLon  Test   •  Let  us  use  the  ASN  and   GeoIP  databases  that  we   used  to  enrich  our  data  as  a   reference  of  the  “true”   populaaon.     •  But,  but,  human  beings  are   unpredictable!  We  will   never  be  able  to  forecast   this!      
  26. 26. Can  we  get  a  beSer  look?   •  Don’t  like  squinang  either   •  Staasacal  inference-­‐based  comparison  models   (hypothesis  tesang)   •  Exact  binomial  tests  (when  we  have  the  “true”  pop)   •  Chi-­‐squared  proporaon  tests  (similar  to   independence  tests)    
  27. 27. Can  we  get  a  beSer  look?   •  We  can  beUer  esamate,  with  confidence  intervals,  our   measures  of  error.   •  Also,  p-­‐values!  (with  apologies  to  Alex  HuUon)   •  We  promise  to  be  very  conservaave  in  using  them.    
  28. 28. Hacker  Spirit  Animal™  Guide   •  US  –  Eagle   •  CA  –  Moose   •  FR  –  Frog   •  GB  –  Bulldog   •  AU  –  Koala   •  BR  –  Capybara  /  Toucan   •  Texas  –  Armadillo   •  Disclaimer:  we  do  not  endorse  Geolocaaon-­‐based  aUribuaon  
  29. 29. Trend  Comparison  
  30. 30. Introducing  COMBINE   •  Harvesang  feeds   takes  some  work.   •  Most  of  us  let   somebody  else  do  it   without  thinking   about  what  it   actually  takes.  
  31. 31. Introducing  COMBINE   hUps://  
  32. 32. Introducing  COMBINE   •  Components:   1.  Reaper  gathers  the  threat  data  directly  from  feeds.   2.  Thresher  normalizes  it  into  a  simplisac  data  model.   3.  Winnower  opaonally  performs  basic  validaaon  or   enrichment.   4.  Baler  transforms  the  data  into  CybOX,  CSV,  JSON,  and   CIM.  (Only  CSV  and  JSON  work  right  now).  Could  also   write  others  fairly  easily.  (nudge  nudge,  wink  wink)  
  33. 33. Introducing  COMBINE   •  Always  trying  to  feed   it  more.  Lots  of   possibiliaes,   including  your  own   data  sources.   •  We  clearly  do  NOT   endorse  any  included   feeds.    
  34. 34. Introducing  COMBINE   •  Enrichments  -­‐  think   metadata.     •  AS,  geolocaaon   •  DNS  resoluaons   courtesy  of  Farsight   DNSDB     •  Ask  them  for  an   API  key  to  test  it,  tell   them  Alex  Pinto   sent  you  ;)    
  35. 35. MLSec  Project   •  Both  projects  have  been  released  as  GPLv3  by  MLSec  Project   •  Will  replace  the  internal  versions  we  have  on  the  main  code     •  Looking  for  paracipants  and  data  sharing  agreements   •  Liked  TIQ-­‐TEST?  We  can  benchmark  your  private  feeds  using   these  and  other  techniques     •  Visit  hSps://  ,  message  @MLSecProject   or  just  e-­‐mail  me.!
  36. 36. Take  Aways   •  Analyze  your  data.     •  Extract  value  from  it!   •  Try  before  you  buy!  Different  test   results  mean  different  things  to   different  orgs.   •  Use  the  tools!  Suggest  new  tests!   •  Share  data  with  us!  We  take  good   care  of  it,  make  sure  it  gets   proper  exercise.    
  37. 37. Thanks!   •  Q&A?   •  Feedback!   Alex  Pinto     @alexcpsec   @MLSecProject   ”The  measure  of  intelligence  is  the  ability  to  change."                  -­‐  Albert  Einstein     Kyle  Maxwell   @kylemaxwell