Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring

2,079 views

Published on

This session showcases how Splunk can be used to build a risk scoring engine designed to detect fraud and other suspicious activities. This presentation includes a real-world fraud detection use case, a detailed description of the searches and lookups, which drive risk scoring, as well as other cyber security related applications of risk scoring.

Published in: Technology
  • Be the first to comment

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring

  1. 1. Copyright  ©  2014  Splunk  Inc.   Rob  Perdue   VP  Prof  Services,  8020  Labs   Robert.perdue@8020labs.com   Detect  Fraud  and   Suspicious  Events     Using  Risk  Scoring  
  2. 2. IntroducKon   !   Rob  Perdue,  VP  Professional  Services  at  8020  Labs   –  Cyber  security  professional  for  12  years   –  Specialize  in  Security  OperaKons,  DFIR  in  financial  sector   –  Previously  held  posiKons  at  IBM,  ADP,  Viacom  and  ThreatGRID   –  Splunking  since  2008     2  
  3. 3. Agenda   !   What  I  hope  you  will  learn   !   Why  am  I  talking  about  fraud?   !   Case  Study:  W-­‐2  fraud   !   Fraud  DetecKon  Framework  (FDF)   !   CreaKng  Baselines   !   Risk  Scoring   !   Cyber  use  cases  for  FDF   !   Key  takeaways   !   Q  &  A     3  
  4. 4. What  I  Hope  You  Will  Learn   !   New  and  exciKng  ways  to  mine  your  data   !   The  power  of  the  eval  command  to  score  risk   !   The  usefulness  of  lookup  tables  for  baselining     –  Inputlookup   –  outputlookup   !   Different  ways  to  detect  suspicious  acKviKes       4  
  5. 5. Why  Am  I  Talking  About  Fraud?   !   Contacted  to  assist  in  an  IR  invesKgaKon   !   Turned  out  not  to  be  a  typical  IR  engagement   !   Ever  hear  of  W-­‐2  fraud?  I  hadn’t.   –  Steal  a  W-­‐2  and  file  taxes  before  the  real  person  does     5  
  6. 6. Case  Study:  W-­‐2  Fraud     !   Tasked  with  finding  unauthorized  access  to  W-­‐2’s   –  During  tax  season   !   Huge  amount  of  data   –  Millions  of  rows  of  logs   !   Relevant  logs  spread  across  several  database  tables  and  files   !   Not  really  sure  what  W-­‐2  fraud  looked  like   6  
  7. 7. Case  Study:  W-­‐2  Fraud     !   How  the  data  was  distributed:     7   Summary   Tables   Main  DB   Stand-­‐alone   Splunk   Several  CSV   Files  
  8. 8. Case  Study  Con’t   !   An  idea…consolidate  data  into  a  single  Splunk  instance   !   No  signature  for  fraud,  no  problem   !   Score  a  risk  value  for  each  W-­‐2  transacKon   –  Country  of  origin   –  Uniqueness  of  Source  IP   –  Day  of  Week   –  History  of  IP   !   All  of  that  resulted  in  one  ugly  search…     8  
  9. 9. Case  Study  Con’t   !   One  ugly  search…   9   index=w2  source="summarytable.csv"  webpage="*administrator*"  |eval  daymonth=date_month+date_mday   |eval  full_user=username+"@"+group|eval  full_user=lower(full_user)  |iplocaKon  src   |stats  values(Country)  AS  Country  values(Region)  AS  State  values(City)  AS  City  values(date_wday)  AS  Day  dc(daymonth)  AS  Unique_Days  count  as   user_ip_count  by  src,  full_user|join  full_user  [search  index=w2  source="  summarytableall.csv"  webpage="*administrator*"       |  eval  full_user=username+"@"+group    |  eval  full_user=lower(full_user)  |stats  count  as  total_W2_events  by  full_user]   |eval  traffic_per_IP=round((user_ip_count/total_W2_events)*100)|join  full_user  src[search  index=w2_history  |stats  values(days_seen)  AS   days_seen  values(total_count)  AS  hist_total_count  by  src,  full_user|fields  src,full_user,days_seen,  hist_total_count]     |eval  Risk_Score=0|eval  Risk_Score=if(traffic_per_ip<100  AND  days_seen<14,  Risk_Score+3,Risk_Score+0)|eval  Risk_Score=if(traffic_per_ip   ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)|eval  Risk_Score=if(Day=="saturday"  OR  Day=="sunday",Risk_Score+1,  Risk_Score+0)|eval   Risk_Score=if(Unique_Days=="1",  Risk_Score+2,  Risk_Score+0)|eval  Risk_Score=if(total_W2_events=="1",  Risk_Score+2,  Risk_Score+0)|eval   Risk_Score=if(Country!="United  States",  Risk_Score+2,  Risk_Score+0)|eval  Risk_Score=if(days_seen>60,  Risk_Score-­‐3,  Risk_Score+0)|eval   Risk_Score=if(traffic_per_ip  <100  AND  days_seen>13,  Risk_Score+1,Risk_Score+0)   |fields  full_user,  src,  Country,  State,  City,  Risk_Score    |sort  -­‐Risk_Score  
  10. 10. Let’s  Break  it  Down   10   index=w2  source="summarytable.csv"  webpage="*administrator*"     |eval  daymonth=date_month+date_mday   |eval  full_user=username+"@"+group   |eval  full_user=lower(full_user)       |iplocaKon  src   |stats  values(Country)  AS  Country  values(Region)  AS  State  values(City)  AS  City  values(date_wday)   AS  Day  dc(daymonth)  AS  Unique_Days  count  as  user_ip_count  by  src,  full_user  
  11. 11. Let’s  Keep  Breaking  it  Down   11   |join  full_user  [search  index=w2  source="  summarytableall.csv"  webpage="*administrator*"       |  eval  full_user=username+"@"+group       |  eval  full_user=lower(full_user)     |stats  count  as  total_W2_events  by  full_user]   |eval  traffic_per_IP=round((user_ip_count/total_W2_events)*100)     Should  have  used  the  eventstats  funcKon…more  on  that  later.  
  12. 12. …and  Down   12   |join  full_user  src[search  index=w2_history     |stats  values(days_seen)  AS  days_seen  values(total_count)  AS  hist_total_count  by  src,   full_user|fields  src,full_user,days_seen,  hist_total_count]    
  13. 13. …and  Down   13   |eval  Risk_Score=0   |eval  Risk_Score=if(traffic_per_ip<100  AND  days_seen<14,  Risk_Score+3,Risk_Score+0)   |eval  Risk_Score=if(traffic_per_ip  ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)   |eval  Risk_Score=if(Day=="saturday"  OR  Day=="sunday",Risk_Score+1,  Risk_Score+0)   |eval  Risk_Score=if(Unique_Days=="1",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(total_W2_events=="1",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(Country!="United  States",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(days_seen>60,  Risk_Score-­‐3,  Risk_Score+0)   |eval  Risk_Score=if(traffic_per_ip  <100  AND  days_seen>13,  Risk_Score+1,Risk_Score+0)     And  finally…     |fields  full_user,  src,  Country,  State,  City,  Risk_Score    |sort  -­‐Risk_Score    
  14. 14. Where’s  the  Magic?   14   !   CreaKon  of  a  composite  event   –  Join   –  Stats   !   Use  of  eval  to  score  the  event   –  |eval  Risk_Score=if(traffic_per_ip  ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)   !   Know  the  data   –  What  did  the  URL  for  W-­‐2  access  look  like?   –  What  could  I  extract  from  the  logs  to  build  a  profile?        
  15. 15. Closing  the  Case  Study   !   It  worked,  but…   !   ReacKve  in  nature   !   Not  terribly  efficient   !   Risk  scoring  could  be  be{er   !   Spawned  the  Fraud  DetecKon  Framework  (FDF)   15  
  16. 16. Fraud  DetecKon  Framework   !   UKlize  everything  you  can  from  a  single  log  event     –  Timestamp   –  Time  of  Day   –  User  Agent  String   –  URL   –  IP  Info   –  User  Name   !   Enrich  the  log   –  Even{ypes   –  GeoIP   –  IP  History   –  User  History   –  Watch  lists   –  Tags   !   ConKnuous  Baselining   !   Risk  Scoring   16  
  17. 17. What’s  in  a  Log?   17   2002-­‐05-­‐02  17:42:15  172.22.255.255  -­‐  172.30.255.255  80  GET  /images/picture.jpg  robper  200   Mozilla/4.0+(compaKble;MSIE+5.5;+Windows+2000+Server)   Day  of  Week   Time  of  Day   Source  IP   Method   URI  Stem   User  Agent   Server  IP   User  Name  
  18. 18. Enriching  Your  Logs   !   EventTypes/Tags   –  What  kind  of  transacKon  was  this?   !   GeoIP  (iplocaKon)   –  Where  is  this  IP  coming  from?   !   IP  History   –  Have  I  ever  seen  this  IP  before?   !   User  History   –  When’s  the  last  Kme  I’ve  seen  this  ID  before?   –  Is  this  an  inacKve  account?   !   User  Agent  String   –  Is  this  UAS  unusual?     –  Have  I  seen  it  before  from  this  user?   –  Is  there  a  non-­‐English  language  preference?   !   Watch  lists   –  Is  this  IP  on  any  threat  or  fraud  watchlists?     18  
  19. 19. Building  Event  Types   !   No  need  to  score  a  GET  request  to  a  jpg  file   !   Fully  understand  the  applicaKon  you  are  scoring   –  App  Dev  guys  are  our  friends   –  Don’t  assume  you  now  what  a  parKcular  URL  is,  or  isn’t,  for   !   Build  even{ypes  for  transacKons  of  interest   –  W-­‐2  reports   –  Payroll  ExecuKon   –  Beneficiary  Change   –  Direct  Deposit  Change   –  Successful  Logons       19  
  20. 20. Baselining   !   What  does  this  usually  look  like?   !   Enables  risk  scoring   !   Relies  heavily  on  lookup  tables   !   Lesser  known  lookup  commands   –  Inputlookup   –  Outputlookup       20  
  21. 21. FDF:  Baselines   !   GeoIP   –  Where  does  this  client  usually  log  in  from?     !   User  Profiles   –  User  Agent  String   –  IP  Info   –  User  Logon  History       21  
  22. 22. FDF:  GeoIP   !   Determine  primary  locaKon  of  client   !   Feeds  into  Haversine  formula   –  h{ps://apps.splunk.com/app/936/   !   Scheduled  search   !   UKlizes  inputlookup  and  outputlookup     22  
  23. 23. FDF:  GeoIP   23   index=hrapp|iplocaKon  allfields=true  src|eval  clientlat=lat|eval  clientlon=lon|  stats  min(_Kme)   AS  firstTime  max(_Kme)  AS  lastTime  count  by  client,Region,Timezone,clientlat,clientlon  | eventstats  sum(count)  as  client_total  by  client|  inputlookup  append=T  client_geoProfiles.csv| eventstats  sum(client_total)  AS  client_total  by  client,Region,Timezone,clientlat,clientlon|stats   min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  sum(count)  AS  count  by  client_total,   client,Region,Timezone,clientlat,clientlon|eval  percent=round((count/client_total)*100)| outputlookup  client_geoProfiles.csv|where  percent>75|outputlookup  client_geoBase.csv   !   GeoIP  Baseline  Search:    
  24. 24. Let’s  Break  it  Down   24   index=hrapp|iplocaKon  allfields=true  src   |eval  clientlat=lat|eval  clientlon=lon   |  stats  min(_Kme)  AS  firstTime  max(_Kme)  AS  lastTime  count  by   client,Region,Timezone,clientlat,clientlon     |eventstats  sum(count)  as  client_total  by  client   |  inputlookup  append=T  client_geoProfiles.csv   |eventstats  sum(client_total)  AS  client_total  by  client,Region,Timezone,clientlat,clientlon   |stats  min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  sum(count)  AS  count  by   client_total,  client,Region,Timezone,clientlat,clientlon   |eval  percent=round((count/client_total)*100)   |outputlookup  client_geoProfiles.csv   |where  percent>75  |outputlookup  client_geoBase.csv   How  this  data  is   used  is  shown  on   slide  32  
  25. 25. How  it  Looks…   25  
  26. 26. FDF:  User  Baseline   !   Create  profiles  for  each  users   –  First/Last  Time   –  User  Agent  String   –  IP  Address   !   Scheduled  search   !   UKlizes  inputlookup  and  outputlookup     26  
  27. 27. FDF:  User  Baseline   27   index=hrapp|  fillnull  value=unknown  tag::src  |  stats    min(_Kme)  AS  firstTime  max(_Kme)  AS   lastTime  first(date_wday)  AS  weekday  by  user,client,src,user_agent,tag::src,  tag  |inputlookup     append=T  user_Profiles.csv  |  stats    min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime   values(weekday)  AS  weekday  by  user,client,src,user_agent,tag::src,tag  |  outputlookup     user_Profiles.csv   !   User  baseline  search:    
  28. 28. Breaking  it  Down   28   index=hrapp|  fillnull  value=unknown  tag::src     |  stats    min(_Kme)  AS  firstTime  max(_Kme)  AS  lastTime  first(date_wday)  AS  weekday  by   user,client,src,user_agent,tag::src,  tag     |inputlookup    append=T  user_Profiles.csv     |  stats    min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  values(weekday)  AS  weekday  by   user,client,src,user_agent,tag::src,tag     |  outputlookup    user_Profiles.csv   How  this  data  is   used  is  shown  on   slide  32  
  29. 29. How  it  Looks   29  
  30. 30. FDF:  Risk  Engine   !   Anomaly  detecKon  using  the  baseline  data   !   Enriches  the  log  data   –  Watchlists   –  Tags   –  Haversine     30  
  31. 31. FDF:  Risk  Engine   31   |inputlookup  user_Profiles.csv|search  tag=w2  OR  tag=payroll|lookup  client_geoBase.csv  client   OUTPUT  clientlat,clientlon|iplocaKon  allfields=true  src|lookup  threatlist  ip  as  src  OUTPUT   descripKon|  eval  short_lon=round(lon,  2)|  eval  short_lat=round(lat,  2)|eval   c_lon=round(clientlon,  2)|  eval  c_lat=round(clientlat,  2)|strcat  c_lat  ","  c_lon  as  latlon|  strcat   short_lat  ","  short_lon  as  latlon2|  haversine  originField=latlon  latlon2  unit=mi  |eval   diff=(round((lastTime-­‐firstTime)/86400))|eval  risk=0|eval  risk=if(distance>0  AND  disance<300,   risk+5,  risk+0)|eval  risk=if(distance>299,  risk+15,  risk+0)|eval  risk=if(diff<5,  risk+10,  risk+0)|eval   risk=if(Country!="United  States",  risk+50,  risk+0)|eval  risk=if('tag::src'="malicious",  risk+30,  risk +1)|eval  risk=if(weekday="Saturday"  OR  weekday="Sunday",  risk+10,  risk+1)|eval   risk=if(descripKon="KnownBad",  risk+10,  risk+0)|eval  risk=if('tag::src'="whitelisted",  risk-­‐10,  risk +1)|eval  risk=if(risk<0,  1,  risk+0)|eval  distance=round(distance)|fields   src,Country,Region,distance,  client,  user,  tag::src,descripKon,tag,risk|search  risk>0  
  32. 32. Let’s  Break  it  Down   32   |inputlookup  user_Profiles.csv       |search  tag=w2  OR  tag=payroll   |lookup  client_geoBase.csv  client  OUTPUT  clientlat,clientlon   |iplocaKon  allfields=true  src   |lookup  threatlist  ip  as  src  OUTPUT  descripKon   |  eval  short_lon=round(lon,  2)   |  eval  short_lat=round(lat,  2)   |eval  c_lon=round(clientlon,  2)   |  eval  c_lat=round(clientlat,  2)   |strcat  c_lat  ","  c_lon  as  latlon   |  strcat  short_lat  ","  short_lon  as  latlon2   |  haversine  originField=latlon  latlon2  unit=mi   From  Slide  28   From  Slide  24  
  33. 33. Let’s  Keep  Breaking  it  Down…   33   |eval  diff=(round((lastTime-­‐firstTime)/86400))   |eval  risk=0   |eval  risk=if(distance>0  AND  disance<300,  risk+5,  risk+0)   |eval  risk=if(distance>299,  risk+15,  risk+0)   |eval  risk=if(diff<5,  risk+10,  risk+0)   |eval  risk=if(Country!="United  States",  risk+50,  risk+0)   |eval  risk=if('tag::src'="malicious",  risk+29,  risk+1)   |eval  risk=if(weekday="Saturday"  OR  weekday="Sunday",  risk+10,  risk+1)   |eval  risk=if(descripKon="KnownBad",  risk+10,  risk+0)   |eval  risk=if('tag::src'="whitelisted",  risk-­‐10,  risk+1)   |eval  risk=if(risk<0,  1,  risk+0)   |eval  distance=round(distance)   |fields  src,Country,Region,distance,  client,  user,  tag::src,descripKon,risk   |search  risk>0  
  34. 34. What  it  Looks  Like…   34  
  35. 35. FDF:  Scoring  Review   !   In  its  current  state:   –  EssenKally  scores  the  risk  of  the  session   –  Can  focus  score  on  parKcular  event  types  (e.g.,  direct  deposit,  payroll)   –  Does  not  score  behavior  while  in  the  app   –  Good  job  of  detecKng  compromised  creds   !   Can  easily  be  modified  to…   –  Detect  transacKon  anomalies  (e.g.,  wire  transfers,  payroll  fraud)   –  Incorporate  Bremford’s  law   ê  h{p://apps.splunk.com/app/355/   –  Score  other  risks     35  
  36. 36. FDF:  Other  Cyber  Use  Cases   !   Compromised  creds   –  FTP   –  OWA   –  VPN   –  Custom  apps   !   User  profiles   –  Proxy  logs   –  Logon  Kmes   !   Risk  scoring   –  IPS  Alert  +  AV  Hit  +  Failed  Logon  +  ?     36  
  37. 37. FDF:  Side  Story   !   One  compromised  FTP  account  reported   –  The  client  wanted  to  know  how  many  other  accounts  were  used  for   unauthorized  access   –  ~600  acKve  FTP  accounts   !   Fortunately  the  client  had  a  year’s  worth  of  FTP  logs  in  Splunk   !   UKlized  the  FDF  framework  to  detect  14  addiKonal  accounts     37  
  38. 38. Key  Takeaways   !   Baseline  your  data   !   Inputlookup  and  outputlookup  very  powerful  baselining  tools   !   Chaining  eval  statements  is  an  effecKve  way  of  scoring  risk   !   Use  every  bit  of  informaKon  found  in  an  individual  log   !   Enrich  what  you  can     38  
  39. 39. Q&A     39  
  40. 40. 40   Security  office  hours:  11:00  AM  –  2:00  PM  @Room  103  Everyday    Geek  out,  share  ideas  with  Enterprise  Security  developers   Red  Team  /  Blue  Team  -­‐  Challenge  your  skills  and  learn  new  tricks   Mon-­‐Wed:  3:00  PM  –  6:00  PM  @Splunk  Community  Lounge   Thurs:  11:00  AM  –  2:00  PM   Learn,  share  and  hack   Birds  of  a  feather-­‐  Collaborate  and  brainstorm  with  security  ninjas       Thurs:  12:00  PM  –  1:00  PM  @Meal  Room    
  41. 41. THANK  YOU  

×