Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring


This session showcases how Splunk can be used to build a risk scoring engine designed to detect fraud and other suspicious activities. This presentation includes a real-world fraud detection use case, a detailed description of the searches and lookups, which drive risk scoring, as well as other cyber security related applications of risk scoring.

  1. 1. Copyright  ©  2014  Splunk  Inc.   Rob  Perdue   VP  Prof  Services,  8020  Labs   Detect  Fraud  and   Suspicious  Events     Using  Risk  Scoring  
  2. 2. IntroducKon   !   Rob  Perdue,  VP  Professional  Services  at  8020  Labs   –  Cyber  security  professional  for  12  years   –  Specialize  in  Security  OperaKons,  DFIR  in  financial  sector   –  Previously  held  posiKons  at  IBM,  ADP,  Viacom  and  ThreatGRID   –  Splunking  since  2008     2  
  3. 3. Agenda   !   What  I  hope  you  will  learn   !   Why  am  I  talking  about  fraud?   !   Case  Study:  W-­‐2  fraud   !   Fraud  DetecKon  Framework  (FDF)   !   CreaKng  Baselines   !   Risk  Scoring   !   Cyber  use  cases  for  FDF   !   Key  takeaways   !   Q  &  A     3  
  4. 4. What  I  Hope  You  Will  Learn   !   New  and  exciKng  ways  to  mine  your  data   !   The  power  of  the  eval  command  to  score  risk   !   The  usefulness  of  lookup  tables  for  baselining     –  Inputlookup   –  outputlookup   !   Different  ways  to  detect  suspicious  acKviKes       4  
  5. 5. Why  Am  I  Talking  About  Fraud?   !   Contacted  to  assist  in  an  IR  invesKgaKon   !   Turned  out  not  to  be  a  typical  IR  engagement   !   Ever  hear  of  W-­‐2  fraud?  I  hadn’t.   –  Steal  a  W-­‐2  and  file  taxes  before  the  real  person  does     5  
  6. 6. Case  Study:  W-­‐2  Fraud     !   Tasked  with  finding  unauthorized  access  to  W-­‐2’s   –  During  tax  season   !   Huge  amount  of  data   –  Millions  of  rows  of  logs   !   Relevant  logs  spread  across  several  database  tables  and  files   !   Not  really  sure  what  W-­‐2  fraud  looked  like   6  
  7. 7. Case  Study:  W-­‐2  Fraud     !   How  the  data  was  distributed:     7   Summary   Tables   Main  DB   Stand-­‐alone   Splunk   Several  CSV   Files  
  8. 8. Case  Study  Con’t   !   An  idea…consolidate  data  into  a  single  Splunk  instance   !   No  signature  for  fraud,  no  problem   !   Score  a  risk  value  for  each  W-­‐2  transacKon   –  Country  of  origin   –  Uniqueness  of  Source  IP   –  Day  of  Week   –  History  of  IP   !   All  of  that  resulted  in  one  ugly  search…     8  
  9. 9. Case  Study  Con’t   !   One  ugly  search…   9   index=w2  source="summarytable.csv"  webpage="*administrator*"  |eval  daymonth=date_month+date_mday   |eval  full_user=username+"@"+group|eval  full_user=lower(full_user)  |iplocaKon  src   |stats  values(Country)  AS  Country  values(Region)  AS  State  values(City)  AS  City  values(date_wday)  AS  Day  dc(daymonth)  AS  Unique_Days  count  as   user_ip_count  by  src,  full_user|join  full_user  [search  index=w2  source="  summarytableall.csv"  webpage="*administrator*"       |  eval  full_user=username+"@"+group    |  eval  full_user=lower(full_user)  |stats  count  as  total_W2_events  by  full_user]   |eval  traffic_per_IP=round((user_ip_count/total_W2_events)*100)|join  full_user  src[search  index=w2_history  |stats  values(days_seen)  AS   days_seen  values(total_count)  AS  hist_total_count  by  src,  full_user|fields  src,full_user,days_seen,  hist_total_count]     |eval  Risk_Score=0|eval  Risk_Score=if(traffic_per_ip<100  AND  days_seen<14,  Risk_Score+3,Risk_Score+0)|eval  Risk_Score=if(traffic_per_ip   ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)|eval  Risk_Score=if(Day=="saturday"  OR  Day=="sunday",Risk_Score+1,  Risk_Score+0)|eval   Risk_Score=if(Unique_Days=="1",  Risk_Score+2,  Risk_Score+0)|eval  Risk_Score=if(total_W2_events=="1",  Risk_Score+2,  Risk_Score+0)|eval   Risk_Score=if(Country!="United  States",  Risk_Score+2,  Risk_Score+0)|eval  Risk_Score=if(days_seen>60,  Risk_Score-­‐3,  Risk_Score+0)|eval   Risk_Score=if(traffic_per_ip  <100  AND  days_seen>13,  Risk_Score+1,Risk_Score+0)   |fields  full_user,  src,  Country,  State,  City,  Risk_Score    |sort  -­‐Risk_Score  
  10. 10. Let’s  Break  it  Down   10   index=w2  source="summarytable.csv"  webpage="*administrator*"     |eval  daymonth=date_month+date_mday   |eval  full_user=username+"@"+group   |eval  full_user=lower(full_user)       |iplocaKon  src   |stats  values(Country)  AS  Country  values(Region)  AS  State  values(City)  AS  City  values(date_wday)   AS  Day  dc(daymonth)  AS  Unique_Days  count  as  user_ip_count  by  src,  full_user  
  11. 11. Let’s  Keep  Breaking  it  Down   11   |join  full_user  [search  index=w2  source="  summarytableall.csv"  webpage="*administrator*"       |  eval  full_user=username+"@"+group       |  eval  full_user=lower(full_user)     |stats  count  as  total_W2_events  by  full_user]   |eval  traffic_per_IP=round((user_ip_count/total_W2_events)*100)     Should  have  used  the  eventstats  funcKon…more  on  that  later.  
  12. 12. …and  Down   12   |join  full_user  src[search  index=w2_history     |stats  values(days_seen)  AS  days_seen  values(total_count)  AS  hist_total_count  by  src,   full_user|fields  src,full_user,days_seen,  hist_total_count]    
  13. 13. …and  Down   13   |eval  Risk_Score=0   |eval  Risk_Score=if(traffic_per_ip<100  AND  days_seen<14,  Risk_Score+3,Risk_Score+0)   |eval  Risk_Score=if(traffic_per_ip  ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)   |eval  Risk_Score=if(Day=="saturday"  OR  Day=="sunday",Risk_Score+1,  Risk_Score+0)   |eval  Risk_Score=if(Unique_Days=="1",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(total_W2_events=="1",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(Country!="United  States",  Risk_Score+2,  Risk_Score+0)   |eval  Risk_Score=if(days_seen>60,  Risk_Score-­‐3,  Risk_Score+0)   |eval  Risk_Score=if(traffic_per_ip  <100  AND  days_seen>13,  Risk_Score+1,Risk_Score+0)     And  finally…     |fields  full_user,  src,  Country,  State,  City,  Risk_Score    |sort  -­‐Risk_Score    
  14. 14. Where’s  the  Magic?   14   !   CreaKon  of  a  composite  event   –  Join   –  Stats   !   Use  of  eval  to  score  the  event   –  |eval  Risk_Score=if(traffic_per_ip  ==100  AND  days_seen<14,  Risk_Score+1,Risk_Score+0)   !   Know  the  data   –  What  did  the  URL  for  W-­‐2  access  look  like?   –  What  could  I  extract  from  the  logs  to  build  a  profile?        
  15. 15. Closing  the  Case  Study   !   It  worked,  but…   !   ReacKve  in  nature   !   Not  terribly  efficient   !   Risk  scoring  could  be  be{er   !   Spawned  the  Fraud  DetecKon  Framework  (FDF)   15  
  16. 16. Fraud  DetecKon  Framework   !   UKlize  everything  you  can  from  a  single  log  event     –  Timestamp   –  Time  of  Day   –  User  Agent  String   –  URL   –  IP  Info   –  User  Name   !   Enrich  the  log   –  Even{ypes   –  GeoIP   –  IP  History   –  User  History   –  Watch  lists   –  Tags   !   ConKnuous  Baselining   !   Risk  Scoring   16  
  17. 17. What’s  in  a  Log?   17   2002-­‐05-­‐02  17:42:15  -­‐  80  GET  /images/picture.jpg  robper  200   Mozilla/4.0+(compaKble;MSIE+5.5;+Windows+2000+Server)   Day  of  Week   Time  of  Day   Source  IP   Method   URI  Stem   User  Agent   Server  IP   User  Name  
  18. 18. Enriching  Your  Logs   !   EventTypes/Tags   –  What  kind  of  transacKon  was  this?   !   GeoIP  (iplocaKon)   –  Where  is  this  IP  coming  from?   !   IP  History   –  Have  I  ever  seen  this  IP  before?   !   User  History   –  When’s  the  last  Kme  I’ve  seen  this  ID  before?   –  Is  this  an  inacKve  account?   !   User  Agent  String   –  Is  this  UAS  unusual?     –  Have  I  seen  it  before  from  this  user?   –  Is  there  a  non-­‐English  language  preference?   !   Watch  lists   –  Is  this  IP  on  any  threat  or  fraud  watchlists?     18  
  19. 19. Building  Event  Types   !   No  need  to  score  a  GET  request  to  a  jpg  file   !   Fully  understand  the  applicaKon  you  are  scoring   –  App  Dev  guys  are  our  friends   –  Don’t  assume  you  now  what  a  parKcular  URL  is,  or  isn’t,  for   !   Build  even{ypes  for  transacKons  of  interest   –  W-­‐2  reports   –  Payroll  ExecuKon   –  Beneficiary  Change   –  Direct  Deposit  Change   –  Successful  Logons       19  
  20. 20. Baselining   !   What  does  this  usually  look  like?   !   Enables  risk  scoring   !   Relies  heavily  on  lookup  tables   !   Lesser  known  lookup  commands   –  Inputlookup   –  Outputlookup       20  
  21. 21. FDF:  Baselines   !   GeoIP   –  Where  does  this  client  usually  log  in  from?     !   User  Profiles   –  User  Agent  String   –  IP  Info   –  User  Logon  History       21  
  22. 22. FDF:  GeoIP   !   Determine  primary  locaKon  of  client   !   Feeds  into  Haversine  formula   –  h{ps://   !   Scheduled  search   !   UKlizes  inputlookup  and  outputlookup     22  
  23. 23. FDF:  GeoIP   23   index=hrapp|iplocaKon  allfields=true  src|eval  clientlat=lat|eval  clientlon=lon|  stats  min(_Kme)   AS  firstTime  max(_Kme)  AS  lastTime  count  by  client,Region,Timezone,clientlat,clientlon  | eventstats  sum(count)  as  client_total  by  client|  inputlookup  append=T  client_geoProfiles.csv| eventstats  sum(client_total)  AS  client_total  by  client,Region,Timezone,clientlat,clientlon|stats   min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  sum(count)  AS  count  by  client_total,   client,Region,Timezone,clientlat,clientlon|eval  percent=round((count/client_total)*100)| outputlookup  client_geoProfiles.csv|where  percent>75|outputlookup  client_geoBase.csv   !   GeoIP  Baseline  Search:    
  24. 24. Let’s  Break  it  Down   24   index=hrapp|iplocaKon  allfields=true  src   |eval  clientlat=lat|eval  clientlon=lon   |  stats  min(_Kme)  AS  firstTime  max(_Kme)  AS  lastTime  count  by   client,Region,Timezone,clientlat,clientlon     |eventstats  sum(count)  as  client_total  by  client   |  inputlookup  append=T  client_geoProfiles.csv   |eventstats  sum(client_total)  AS  client_total  by  client,Region,Timezone,clientlat,clientlon   |stats  min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  sum(count)  AS  count  by   client_total,  client,Region,Timezone,clientlat,clientlon   |eval  percent=round((count/client_total)*100)   |outputlookup  client_geoProfiles.csv   |where  percent>75  |outputlookup  client_geoBase.csv   How  this  data  is   used  is  shown  on   slide  32  
  25. 25. How  it  Looks…   25  
  26. 26. FDF:  User  Baseline   !   Create  profiles  for  each  users   –  First/Last  Time   –  User  Agent  String   –  IP  Address   !   Scheduled  search   !   UKlizes  inputlookup  and  outputlookup     26  
  27. 27. FDF:  User  Baseline   27   index=hrapp|  fillnull  value=unknown  tag::src  |  stats    min(_Kme)  AS  firstTime  max(_Kme)  AS   lastTime  first(date_wday)  AS  weekday  by  user,client,src,user_agent,tag::src,  tag  |inputlookup     append=T  user_Profiles.csv  |  stats    min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime   values(weekday)  AS  weekday  by  user,client,src,user_agent,tag::src,tag  |  outputlookup     user_Profiles.csv   !   User  baseline  search:    
  28. 28. Breaking  it  Down   28   index=hrapp|  fillnull  value=unknown  tag::src     |  stats    min(_Kme)  AS  firstTime  max(_Kme)  AS  lastTime  first(date_wday)  AS  weekday  by   user,client,src,user_agent,tag::src,  tag     |inputlookup    append=T  user_Profiles.csv     |  stats    min(firstTime)  AS  firstTime  max(lastTime)  AS  lastTime  values(weekday)  AS  weekday  by   user,client,src,user_agent,tag::src,tag     |  outputlookup    user_Profiles.csv   How  this  data  is   used  is  shown  on   slide  32  
  29. 29. How  it  Looks   29  
  30. 30. FDF:  Risk  Engine   !   Anomaly  detecKon  using  the  baseline  data   !   Enriches  the  log  data   –  Watchlists   –  Tags   –  Haversine     30  
  31. 31. FDF:  Risk  Engine   31   |inputlookup  user_Profiles.csv|search  tag=w2  OR  tag=payroll|lookup  client_geoBase.csv  client   OUTPUT  clientlat,clientlon|iplocaKon  allfields=true  src|lookup  threatlist  ip  as  src  OUTPUT   descripKon|  eval  short_lon=round(lon,  2)|  eval  short_lat=round(lat,  2)|eval   c_lon=round(clientlon,  2)|  eval  c_lat=round(clientlat,  2)|strcat  c_lat  ","  c_lon  as  latlon|  strcat   short_lat  ","  short_lon  as  latlon2|  haversine  originField=latlon  latlon2  unit=mi  |eval   diff=(round((lastTime-­‐firstTime)/86400))|eval  risk=0|eval  risk=if(distance>0  AND  disance<300,   risk+5,  risk+0)|eval  risk=if(distance>299,  risk+15,  risk+0)|eval  risk=if(diff<5,  risk+10,  risk+0)|eval   risk=if(Country!="United  States",  risk+50,  risk+0)|eval  risk=if('tag::src'="malicious",  risk+30,  risk +1)|eval  risk=if(weekday="Saturday"  OR  weekday="Sunday",  risk+10,  risk+1)|eval   risk=if(descripKon="KnownBad",  risk+10,  risk+0)|eval  risk=if('tag::src'="whitelisted",  risk-­‐10,  risk +1)|eval  risk=if(risk<0,  1,  risk+0)|eval  distance=round(distance)|fields   src,Country,Region,distance,  client,  user,  tag::src,descripKon,tag,risk|search  risk>0  
  32. 32. Let’s  Break  it  Down   32   |inputlookup  user_Profiles.csv       |search  tag=w2  OR  tag=payroll   |lookup  client_geoBase.csv  client  OUTPUT  clientlat,clientlon   |iplocaKon  allfields=true  src   |lookup  threatlist  ip  as  src  OUTPUT  descripKon   |  eval  short_lon=round(lon,  2)   |  eval  short_lat=round(lat,  2)   |eval  c_lon=round(clientlon,  2)   |  eval  c_lat=round(clientlat,  2)   |strcat  c_lat  ","  c_lon  as  latlon   |  strcat  short_lat  ","  short_lon  as  latlon2   |  haversine  originField=latlon  latlon2  unit=mi   From  Slide  28   From  Slide  24  
  33. 33. Let’s  Keep  Breaking  it  Down…   33   |eval  diff=(round((lastTime-­‐firstTime)/86400))   |eval  risk=0   |eval  risk=if(distance>0  AND  disance<300,  risk+5,  risk+0)   |eval  risk=if(distance>299,  risk+15,  risk+0)   |eval  risk=if(diff<5,  risk+10,  risk+0)   |eval  risk=if(Country!="United  States",  risk+50,  risk+0)   |eval  risk=if('tag::src'="malicious",  risk+29,  risk+1)   |eval  risk=if(weekday="Saturday"  OR  weekday="Sunday",  risk+10,  risk+1)   |eval  risk=if(descripKon="KnownBad",  risk+10,  risk+0)   |eval  risk=if('tag::src'="whitelisted",  risk-­‐10,  risk+1)   |eval  risk=if(risk<0,  1,  risk+0)   |eval  distance=round(distance)   |fields  src,Country,Region,distance,  client,  user,  tag::src,descripKon,risk   |search  risk>0  
  34. 34. What  it  Looks  Like…   34  
  35. 35. FDF:  Scoring  Review   !   In  its  current  state:   –  EssenKally  scores  the  risk  of  the  session   –  Can  focus  score  on  parKcular  event  types  (e.g.,  direct  deposit,  payroll)   –  Does  not  score  behavior  while  in  the  app   –  Good  job  of  detecKng  compromised  creds   !   Can  easily  be  modified  to…   –  Detect  transacKon  anomalies  (e.g.,  wire  transfers,  payroll  fraud)   –  Incorporate  Bremford’s  law   ê  h{p://   –  Score  other  risks     35  
  36. 36. FDF:  Other  Cyber  Use  Cases   !   Compromised  creds   –  FTP   –  OWA   –  VPN   –  Custom  apps   !   User  profiles   –  Proxy  logs   –  Logon  Kmes   !   Risk  scoring   –  IPS  Alert  +  AV  Hit  +  Failed  Logon  +  ?     36  
  37. 37. FDF:  Side  Story   !   One  compromised  FTP  account  reported   –  The  client  wanted  to  know  how  many  other  accounts  were  used  for   unauthorized  access   –  ~600  acKve  FTP  accounts   !   Fortunately  the  client  had  a  year’s  worth  of  FTP  logs  in  Splunk   !   UKlized  the  FDF  framework  to  detect  14  addiKonal  accounts     37  
  38. 38. Key  Takeaways   !   Baseline  your  data   !   Inputlookup  and  outputlookup  very  powerful  baselining  tools   !   Chaining  eval  statements  is  an  effecKve  way  of  scoring  risk   !   Use  every  bit  of  informaKon  found  in  an  individual  log   !   Enrich  what  you  can     38  
  39. 39. Q&A     39  
