A	
  Scalable	
  Approach	
  for	
  Malware	
  
Detec2on	
  through	
  Bounded	
  Feature	
  
Space	
  Behavior	
  Modelin...
What	
  is	
  malware?	
  
	
  
	
  	
  	
  Malware	
  (malicious	
  +	
  soFware)	
  is	
  nothing	
  but	
  
a	
  soFwar...
Mo2va2on	
  
Ø More	
  than	
  403	
  million	
  new	
  malware	
  variants	
  were	
  
created	
  in	
  2011,	
  a	
  41...
Problem	
  Defini2on	
  1/2	
  
q New	
  malware	
  has	
  become	
  very	
  sophisHcated.	
  
q Malware	
  evade	
  trad...
Problem	
  Defini2on	
  2/2	
  
q Scalability	
  is	
  a	
  major	
  problem	
  in	
  exisHng	
  
behavior-­‐based	
  malw...
Related	
  Work	
  1/2	
  
q PracHcality	
  and	
  efficiency	
  of	
  behavior	
  based	
  malware	
  
detecHon	
  depends...
Related	
  Work	
  2/2	
  
q Complex	
  malware	
  behavior	
  models	
  (e.g.,	
  system	
  call	
  
dependency	
  graph...
Behavior	
  Modeling	
  –	
  An	
  Overview	
  
Ø SoFware	
  program	
  perform	
  ac#ons	
  on	
  various	
  
operaHng	
...
Opera2ng	
  System	
  Resource	
  Types	
  
ü File	
  System	
  
ü Registry	
  
ü Process	
  and	
  Thread	
  
ü Netwo...
Bounded	
  Feature	
  space	
  behavior	
  
Modeling	
  (BOFM)	
  
Malware	
  feature	
  
For	
  each	
  type	
  of	
  OS	...
ü  Goal:	
  	
  To	
  be	
  more	
  resilient	
  to	
  commonly	
  used	
  obfuscaHon	
  techniques	
  
v Property	
  1:...
v Property	
  3:	
  IdenHcal	
  acHon	
  sets	
  which	
  are	
  performed	
  on	
  two	
  
different	
  OS	
  resource	
 ...
Goal:	
  Avoid	
  malware	
  feature	
  space	
  growth	
  proporHonal	
  to	
  
number	
  of	
  samples	
  under	
  exami...
OS	
  Resource	
  Types	
  and	
  Corresponding	
  
Ac2ons	
  
Total	
  malware	
  features	
  (N)	
  extracted	
  from	
 ...
Model Construction Work Flow
Example	
  feature	
  vector	
  
	
  
	
  
	
  
	
  
	
  
	
  
Detec2on	
  Method	
  
Ø Machine	
  Learning	
  (ML)	
  classificaHon	
  techniques	
  	
  
used	
  for	
  building	
  Mal...
Experimental	
  Dataset	
  
ü 	
  Training-­‐set	
  of	
  5000	
  malware	
  and	
  80	
  benign	
  samples	
  and	
  a	
...
Experimental	
  Results	
  
ü SVM	
  achieved	
  99.4%	
  detecHon	
  accuracy	
  with	
  no	
  false	
  posiHves	
  and	...
Comparison	
  with	
  Canali	
  et	
  al.	
  (ISSTA	
  2012)	
  
q 	
  Both	
  achieve	
  99%	
  detecHon	
  accuracy	
  ...
Conclusion	
  
ü  Malware	
  evade	
  tradiHonal	
  anH-­‐virus	
  signatures,	
  using	
  various	
  
obfuscaHon	
  tech...
Feature	
  Space	
  Analysis	
  
•  Comparison	
  of	
  malware	
  and	
  benign	
  feature	
  spaces	
  
•  57%	
  of	
  ...
Brief	
  Analysis	
  of	
  Interes2ng	
  Features	
  
Ø ‘NoHfyChangeKey’	
  acHon	
  is	
  very	
  widely	
  used	
  by	
...
Upcoming SlideShare
Loading in …5
×

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

430 views

Published on

ASE 2013 presentation on malware detection. Collaboration between NTU, Singapore and SnT Centre, Luxembourg

  • Be the first to comment

  • Be the first to like this

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

  1. 1. A  Scalable  Approach  for  Malware   Detec2on  through  Bounded  Feature   Space  Behavior  Modeling   Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel  Briand,   Shar Lwin Khin, and Bindu Madhavi Padmanabhuni   Interdisciplinary  Centre  for  ICT  Security,   Reliability,  and  Trust   University  of  Luxembourg,  Luxembourg     School  of  Electrical  and  Electronic  Engineering,     Nanyang  Technological  University,  Singapore  
  2. 2. What  is  malware?          Malware  (malicious  +  soFware)  is  nothing  but   a  soFware  that  do  malicious  things  without   the  vicHm’s  knowledge  
  3. 3. Mo2va2on   Ø More  than  403  million  new  malware  variants  were   created  in  2011,  a  41%  increase  over  2010.     Ø On  average  around  55,000  new  malware  samples   were  reported  per  day.     Ø ExponenHal  growth  of  malware  is  a  major  threat  in   the  soFware  industry  
  4. 4. Problem  Defini2on  1/2   q New  malware  has  become  very  sophisHcated.   q Malware  evade  tradiHonal  anH-­‐virus  signatures,   using  various  obfuscaHon  techniques.   q Malware  authors  change  the  syntacHc  characterisHcs   (i.e.,  structure)  of  a  malicious  program  without   changing  its  semanHcs  (i.e.,  behavior)  
  5. 5. Problem  Defini2on  2/2   q Scalability  is  a  major  problem  in  exisHng   behavior-­‐based  malware  detecHon  techniques   §  malware  feature  space  grows  in  proporHon   with  the  number  of  samples  under   examinaHon   §  ComputaHonally  very  intensive  
  6. 6. Related  Work  1/2   q PracHcality  and  efficiency  of  behavior  based  malware   detecHon  depends  on:     •  size  of  feature  space,     •  computaHonal  complexity,     •  overheads  (e.g.,  pre-­‐processing)   •  detecHon  accuracy   q Simple  malware  behavior  models  (e.g.,  n-­‐gram,  m-­‐bag   and  k-­‐tuple)  generate  huge  feature  spaces  and  require   various  pruning  and  parameter  tuning  mechanisms  
  7. 7. Related  Work  2/2   q Complex  malware  behavior  models  (e.g.,  system  call   dependency  graphs)  are  highly  computaHonally   intensive  
  8. 8. Behavior  Modeling  –  An  Overview   Ø SoFware  program  perform  ac#ons  on  various   operaHng  system  resources.   Ø An  acHon  corresponds  to  a  higher-­‐level  operaHon   (e.g.,  reading  a  file)  composed  of  a  set  of  related   system  calls  (e.g.,  NtReadFile)   Ø Advantage  of  using  acHons  over  system  calls  is  that  OS   may  use  different  names  for  system  calls  that  are  in   fact  serving  the  same  purpose     Ø NtCreateProcess  and  NtCreateProcessEx    maps  to   CreateProcess  acHon  
  9. 9. Opera2ng  System  Resource  Types   ü File  System   ü Registry   ü Process  and  Thread   ü Network   ü SynchronizaHon   ü SecHon    
  10. 10. Bounded  Feature  space  behavior   Modeling  (BOFM)   Malware  feature   For  each  type  of  OS  resource,  the  set  of  acHons  performed  by   malware  on  an  instance  of  the  OS  resource  type  concerned   consHtutes  a  feature  of  the  malware     Ø Example:   Malware  performs,    CreateFile  and  DeleteFile  acHons  on  a  file  instance  C:foo.exe,  and   DeleteFile  acHon  on  another  file  instance  C:abc.dll     This  malware  has  two  features,   {CreateFile,  DeleteFile}  and  {DeleteFile}    with  respect  to  file   resource  instances  C:foo.exe  and  C:abc.dll,  respecHvely.  
  11. 11. ü  Goal:    To  be  more  resilient  to  commonly  used  obfuscaHon  techniques   v Property  1:  Regardless  of  the  number  of  Hmes  an  acHon  is  performed   on  an  OS  resource  instance  it  is  considered  only  once  in  final  feature   set.     E.g.,  ReadFile  acHon  is  performed  several  Hmes  on  a  file  instance  C: Windows...sysfile2.dll;  this  behavior  is  modeled  by  a  BOFM  feature   {ReadFile}       v Property  2:  The  sequence,  in  which  the  acHons  are  performed,  by   malware,  is  ignored  in  feature  construcHon.     E.g.,  malware  features  {ReadFile,  QueryFileInforma9on}  and   {QueryFileInforma9on,  ReadFile}  are  considered  idenHcal.         Proper2es  of  BOFM  features  1/2  
  12. 12. v Property  3:  IdenHcal  acHon  sets  which  are  performed  on  two   different  OS  resource  instances  of  same  type  are  modeled  as  a   single  feature.     E.g.,  acHons  CreateFile  and  DeleteFile  performed  on  two  different  file   resource  instances  C:Windowsabc.dll  and  D:Personel  foo.exe   are  modeled  as  a  single  BOFM  feature  {CreateFile,  DeleteFile}       Proper2es  of  BOFM  features  2/2  
  13. 13. Goal:  Avoid  malware  feature  space  growth  proporHonal  to   number  of  samples  under  examinaHon       •  Lets  j  to  be  OS  resource  type,  where       •  Total  number  kj  of  possible  acHons  that  a  malware  may   perform  on  an  OS  resource  instance  of  type  j  is  a  constant   •  Maximum  number  mj    of  possible  features  with  regard  to  OS   resource  type  j  is  also  a  constant            Where,   •  Maximum  number  of  possible  features  N  for  all  resource   types  is  always  the  following  constant  :   Bounded  Feature  Space  
  14. 14. OS  Resource  Types  and  Corresponding   Ac2ons   Total  malware  features  (N)  extracted  from  these  six  OS  resources  is  16,652  
  15. 15. Model Construction Work Flow Example  feature  vector              
  16. 16. Detec2on  Method   Ø Machine  Learning  (ML)  classificaHon  techniques     used  for  building  Malware  DetecHon  models   Ø LogisHc  Regression  (LR)  and  Support  Vector   Machine  (SVM)  are  used  in  our  experiments   Ø Malware  detecHon  process  involves  two  phases   •  Phase  1:  model  building  phase     •  Phase  2:  model  evaluaHon  phase      
  17. 17. Experimental  Dataset   ü   Training-­‐set  of  5000  malware  and  80  benign  samples  and  a  test-­‐set     of  300  malware  and  20  benign  samples  
  18. 18. Experimental  Results   ü SVM  achieved  99.4%  detecHon  accuracy  with  no  false  posiHves  and   LR  achieved  99.6%  detecHon  accuracy  with  1%  FP  rate     ü Balanced  test-­‐sets  consists  of  20  randomly  selected  (from  a  pool  of   300  samples)  malware  samples  and  the  20  benign  samples.   ü For  balance  test-­‐sets  SVM  yielded  a  perfect  accuracy  of  100%  with   0%  FP  rate  and  LR  achieved  99.5%  detecHon  accuracy  with  1%  FP   rate.  
  19. 19. Comparison  with  Canali  et  al.  (ISSTA  2012)   q   Both  achieve  99%  detecHon  accuracy   q However,     §  BOFM  generated  only  569  acHve  features  whereas  Canali  et   al.  generated  several  millions.   §   It  took  1.67  hrs  to  extract  malware  features  using  BOFM   while  Canali  et  al.  took  around  48  hrs.   §   It  took  26  seconds  to  train  the  SVM  classifier,  consuming   only  200MB  RAM.  Whereas,  Canali’s  approach  consumed   more  than  1GB  RAM  to  perform  signature  matching.   §  BOFM  is  much  more  efficient  and  scalable  
  20. 20. Conclusion   ü  Malware  evade  tradiHonal  anH-­‐virus  signatures,  using  various   obfuscaHon  techniques.   ü  Behavior-­‐based  malware  detecHon  is  an  increasingly  common   soluHon   ü  Scalability  is  a  major  problem  in  exisHng  behavior-­‐based  malware   detecHon  techniques   ü  We  proposed  a  bounded  feature  space  malware  behavior  modeling   (BOFM)  technique  to  address  the  scalability  issue.   ü  BOFM  entails  a  fixed  number  of  features  that  do  not  grow  in   proporHon  with  the  number  of  malware  samples  under  examinaHon   ü  Benchmark:  BOFM  combined  with  SVM  achieved  100%  detecHon   accuracy,  within  less  than  a  minute  and  200  MB  of  memory  
  21. 21. Feature  Space  Analysis   •  Comparison  of  malware  and  benign  feature  spaces   •  57%  of  unique  malware  features  suggests  that  BOFM   is  a  promising  technique  to  model  the  malware   behavior    
  22. 22. Brief  Analysis  of  Interes2ng  Features   Ø ‘NoHfyChangeKey’  acHon  is  very  widely  used  by   malware  samples  compared  to  benign  samples  (86%   Vs.  15%).   Ø ‘DeleteKey’  acHon  also  widely  used  by  malware   samples.   Ø AcHons  such  as  ‘OpenFile’,  ‘GetFileAmributes’,   ‘CreateMutex’  and  ‘ReleaseMutex’  widely  appeared   in  both  malware  and  benign  samples.  

×