A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

  • 93 views
Uploaded on

ASE 2013 presentation on malware detection. Collaboration between NTU, Singapore and SnT Centre, Luxembourg

ASE 2013 presentation on malware detection. Collaboration between NTU, Singapore and SnT Centre, Luxembourg

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
93
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A  Scalable  Approach  for  Malware   Detec2on  through  Bounded  Feature   Space  Behavior  Modeling   Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel  Briand,   Shar Lwin Khin, and Bindu Madhavi Padmanabhuni   Interdisciplinary  Centre  for  ICT  Security,   Reliability,  and  Trust   University  of  Luxembourg,  Luxembourg     School  of  Electrical  and  Electronic  Engineering,     Nanyang  Technological  University,  Singapore  
  • 2. What  is  malware?          Malware  (malicious  +  soFware)  is  nothing  but   a  soFware  that  do  malicious  things  without   the  vicHm’s  knowledge  
  • 3. Mo2va2on   Ø More  than  403  million  new  malware  variants  were   created  in  2011,  a  41%  increase  over  2010.     Ø On  average  around  55,000  new  malware  samples   were  reported  per  day.     Ø ExponenHal  growth  of  malware  is  a  major  threat  in   the  soFware  industry  
  • 4. Problem  Defini2on  1/2   q New  malware  has  become  very  sophisHcated.   q Malware  evade  tradiHonal  anH-­‐virus  signatures,   using  various  obfuscaHon  techniques.   q Malware  authors  change  the  syntacHc  characterisHcs   (i.e.,  structure)  of  a  malicious  program  without   changing  its  semanHcs  (i.e.,  behavior)  
  • 5. Problem  Defini2on  2/2   q Scalability  is  a  major  problem  in  exisHng   behavior-­‐based  malware  detecHon  techniques   §  malware  feature  space  grows  in  proporHon   with  the  number  of  samples  under   examinaHon   §  ComputaHonally  very  intensive  
  • 6. Related  Work  1/2   q PracHcality  and  efficiency  of  behavior  based  malware   detecHon  depends  on:     •  size  of  feature  space,     •  computaHonal  complexity,     •  overheads  (e.g.,  pre-­‐processing)   •  detecHon  accuracy   q Simple  malware  behavior  models  (e.g.,  n-­‐gram,  m-­‐bag   and  k-­‐tuple)  generate  huge  feature  spaces  and  require   various  pruning  and  parameter  tuning  mechanisms  
  • 7. Related  Work  2/2   q Complex  malware  behavior  models  (e.g.,  system  call   dependency  graphs)  are  highly  computaHonally   intensive  
  • 8. Behavior  Modeling  –  An  Overview   Ø SoFware  program  perform  ac#ons  on  various   operaHng  system  resources.   Ø An  acHon  corresponds  to  a  higher-­‐level  operaHon   (e.g.,  reading  a  file)  composed  of  a  set  of  related   system  calls  (e.g.,  NtReadFile)   Ø Advantage  of  using  acHons  over  system  calls  is  that  OS   may  use  different  names  for  system  calls  that  are  in   fact  serving  the  same  purpose     Ø NtCreateProcess  and  NtCreateProcessEx    maps  to   CreateProcess  acHon  
  • 9. Opera2ng  System  Resource  Types   ü File  System   ü Registry   ü Process  and  Thread   ü Network   ü SynchronizaHon   ü SecHon    
  • 10. Bounded  Feature  space  behavior   Modeling  (BOFM)   Malware  feature   For  each  type  of  OS  resource,  the  set  of  acHons  performed  by   malware  on  an  instance  of  the  OS  resource  type  concerned   consHtutes  a  feature  of  the  malware     Ø Example:   Malware  performs,    CreateFile  and  DeleteFile  acHons  on  a  file  instance  C:foo.exe,  and   DeleteFile  acHon  on  another  file  instance  C:abc.dll     This  malware  has  two  features,   {CreateFile,  DeleteFile}  and  {DeleteFile}    with  respect  to  file   resource  instances  C:foo.exe  and  C:abc.dll,  respecHvely.  
  • 11. ü  Goal:    To  be  more  resilient  to  commonly  used  obfuscaHon  techniques   v Property  1:  Regardless  of  the  number  of  Hmes  an  acHon  is  performed   on  an  OS  resource  instance  it  is  considered  only  once  in  final  feature   set.     E.g.,  ReadFile  acHon  is  performed  several  Hmes  on  a  file  instance  C: Windows...sysfile2.dll;  this  behavior  is  modeled  by  a  BOFM  feature   {ReadFile}       v Property  2:  The  sequence,  in  which  the  acHons  are  performed,  by   malware,  is  ignored  in  feature  construcHon.     E.g.,  malware  features  {ReadFile,  QueryFileInforma9on}  and   {QueryFileInforma9on,  ReadFile}  are  considered  idenHcal.         Proper2es  of  BOFM  features  1/2  
  • 12. v Property  3:  IdenHcal  acHon  sets  which  are  performed  on  two   different  OS  resource  instances  of  same  type  are  modeled  as  a   single  feature.     E.g.,  acHons  CreateFile  and  DeleteFile  performed  on  two  different  file   resource  instances  C:Windowsabc.dll  and  D:Personel  foo.exe   are  modeled  as  a  single  BOFM  feature  {CreateFile,  DeleteFile}       Proper2es  of  BOFM  features  2/2  
  • 13. Goal:  Avoid  malware  feature  space  growth  proporHonal  to   number  of  samples  under  examinaHon       •  Lets  j  to  be  OS  resource  type,  where       •  Total  number  kj  of  possible  acHons  that  a  malware  may   perform  on  an  OS  resource  instance  of  type  j  is  a  constant   •  Maximum  number  mj    of  possible  features  with  regard  to  OS   resource  type  j  is  also  a  constant            Where,   •  Maximum  number  of  possible  features  N  for  all  resource   types  is  always  the  following  constant  :   Bounded  Feature  Space  
  • 14. OS  Resource  Types  and  Corresponding   Ac2ons   Total  malware  features  (N)  extracted  from  these  six  OS  resources  is  16,652  
  • 15. Model Construction Work Flow Example  feature  vector              
  • 16. Detec2on  Method   Ø Machine  Learning  (ML)  classificaHon  techniques     used  for  building  Malware  DetecHon  models   Ø LogisHc  Regression  (LR)  and  Support  Vector   Machine  (SVM)  are  used  in  our  experiments   Ø Malware  detecHon  process  involves  two  phases   •  Phase  1:  model  building  phase     •  Phase  2:  model  evaluaHon  phase      
  • 17. Experimental  Dataset   ü   Training-­‐set  of  5000  malware  and  80  benign  samples  and  a  test-­‐set     of  300  malware  and  20  benign  samples  
  • 18. Experimental  Results   ü SVM  achieved  99.4%  detecHon  accuracy  with  no  false  posiHves  and   LR  achieved  99.6%  detecHon  accuracy  with  1%  FP  rate     ü Balanced  test-­‐sets  consists  of  20  randomly  selected  (from  a  pool  of   300  samples)  malware  samples  and  the  20  benign  samples.   ü For  balance  test-­‐sets  SVM  yielded  a  perfect  accuracy  of  100%  with   0%  FP  rate  and  LR  achieved  99.5%  detecHon  accuracy  with  1%  FP   rate.  
  • 19. Comparison  with  Canali  et  al.  (ISSTA  2012)   q   Both  achieve  99%  detecHon  accuracy   q However,     §  BOFM  generated  only  569  acHve  features  whereas  Canali  et   al.  generated  several  millions.   §   It  took  1.67  hrs  to  extract  malware  features  using  BOFM   while  Canali  et  al.  took  around  48  hrs.   §   It  took  26  seconds  to  train  the  SVM  classifier,  consuming   only  200MB  RAM.  Whereas,  Canali’s  approach  consumed   more  than  1GB  RAM  to  perform  signature  matching.   §  BOFM  is  much  more  efficient  and  scalable  
  • 20. Conclusion   ü  Malware  evade  tradiHonal  anH-­‐virus  signatures,  using  various   obfuscaHon  techniques.   ü  Behavior-­‐based  malware  detecHon  is  an  increasingly  common   soluHon   ü  Scalability  is  a  major  problem  in  exisHng  behavior-­‐based  malware   detecHon  techniques   ü  We  proposed  a  bounded  feature  space  malware  behavior  modeling   (BOFM)  technique  to  address  the  scalability  issue.   ü  BOFM  entails  a  fixed  number  of  features  that  do  not  grow  in   proporHon  with  the  number  of  malware  samples  under  examinaHon   ü  Benchmark:  BOFM  combined  with  SVM  achieved  100%  detecHon   accuracy,  within  less  than  a  minute  and  200  MB  of  memory  
  • 21. Feature  Space  Analysis   •  Comparison  of  malware  and  benign  feature  spaces   •  57%  of  unique  malware  features  suggests  that  BOFM   is  a  promising  technique  to  model  the  malware   behavior    
  • 22. Brief  Analysis  of  Interes2ng  Features   Ø ‘NoHfyChangeKey’  acHon  is  very  widely  used  by   malware  samples  compared  to  benign  samples  (86%   Vs.  15%).   Ø ‘DeleteKey’  acHon  also  widely  used  by  malware   samples.   Ø AcHons  such  as  ‘OpenFile’,  ‘GetFileAmributes’,   ‘CreateMutex’  and  ‘ReleaseMutex’  widely  appeared   in  both  malware  and  benign  samples.