Metamorphic Malware Analysis and Detection


Published on

Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study.


This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12).



Published in: Education, Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Metamorphic Malware Analysis and Detection

  1. 1. Bioinformatics Techniques forMetamorphic Malware Analysisand DetectionMalaviya National Institute of Technology, Jaipurand DetectionSupervisors:Dr. M. S. GaurDr. V. LaxmiBy:Grijesh Chauhan(2009PCP116)
  2. 2. OutlineMalware & Metamorphic malwareMotivationObjectiveBioinformatics TechniquesBioinformatics TechniquesMOMENTUMDatasetResult & AnalysisReferencesMalaviya National Institute of Technology, Jaipur
  3. 3. MalwareMalware are software with intentions to infect andreplicate.ThreatsLoss of dataMalaviya National Institute of Technology, JaipurLoss of dataDegrades computer system performanceIdentity threatTwo broad categoriesMetamorphic: Virus body changes on each replicationPolymorphic: Encrypts malicious payload to avoiddetection
  4. 4. Metamorphic Malware[1/2]Metamorphic malware have similarfunctionality, different structure and signature.Malaviya National Institute of Technology, JaipurSimilar to genetic diversity in Biology.Variant -1 Variant -2 Variant -3Metamorphic EngineDiagram depicts metamorphic malware variants with reordered code
  5. 5. Metamorphic Malware[1/2]Metamorphic Malware automatically re-codes itselfeach time it propagates or is distributed.Conventional signature based scanners areineffective for detecting variants of same malware.Malaviya National Institute of Technology, JaipurSophisticated signature(s) are required to detectmetamorphic variants of malware.
  6. 6. MotivationVariants of metamorphic malware are generatedusing a small embedded metamorphic engine todefeat detection [2].Limited number of instructions are used to generateMalaviya National Institute of Technology, Jaipurvariants so as to preserve functionality.Metamorphic malware like DNA/ protein sequencesmutate from generation to generation, they inheritfunctionality and some structural similarity withancestral malware.
  7. 7. ObjectiveTo devise a method for detection of metamorphicmalware and its variants.To extract the abstract signature(s) usingBioinformatics sequence alignmentMalaviya National Institute of Technology, Jaipurbase code is preserved in different generations, obfuscatedusing junk code or equivalent instructions etc.To identify unseen malware samples using bestrepresentative signatures (group/single) of a family.
  8. 8. Sequence Alignment [1/2]Sequence alignment is a way of arrangingDNA/Protein sequences to identify regions ofsimilarity to infer functional, structural orevolutionary relationship.Malaviya National Institute of Technology, JaipurAlignment MethodsGlobal Alignment - align sequences end to end.Local Alignment - align substring of one sequence withsubstring of other.Multiple Sequence Alignment (MSA) - align more thantwo sequences.
  9. 9. Sequence Alignment [2/2]Global alignmentL G P S S K Q T G K G S - S R I D NL N - I T K S A G K G A I M R L D ALocal alignmentMalaviya National Institute of Technology, JaipurLocal alignment- - - - - - T G - G - - - - - - -- - - - - - A G K G - - - - - - -Alignment ParameterMatchMismatchGapPoint of Mutation
  10. 10. Multiple Sequence AlignmentMSA is extension of pairwise alignment for morethan two sequences.It is used to identify conserved regions across agroup of sequences.Malaviya National Institute of Technology, JaipurM1 M2 M3 M4 M5add add add - add- push push push pushMov mov mov mov mov- call jmp jz jmpjmp jmp mov mov mov• Mi – ith Malware instance
  11. 11. Implementation of MSAMSA is implemented using Progressive technique(ClustalW[9])Progressive MSA follows three steps:Determine similarity between each pair by pairwiseMalaviya National Institute of Technology, JaipurDetermine similarity between each pair by pairwisealignment.Construct a guided tree (Phylogenetic tree) to representevolutionary relationship.MSA is build by aligning closely related groups to mostdistant group according to guided tree.
  12. 12. Phylogenetic TreePhylogenetic Tree depict evolutionary relationshipamong the sequences.To form groups of similarvirusesMalaviya National Institute of Technology, JaipurvirusesGuides MSA progressivelyto align closer groups firstA B D FE( (E,(A,B)), (D,F) )
  13. 13. Similarity MeasurementAlignment Score : Is the sum of score specifiedfor each aligned pair of mnemonics. Higher thescore more similar the sequences.Distance (d) : Calculated using followingformulasMalaviya National Institute of Technology, JaipurformulasHigher the distance more dissimilar the sequences)#(##matchmismatchmismatchNd+=)##(# gapmatchmismatchLd ++=• Nd is Normalized distance, Ld is Levenshtein distance
  14. 14. Identification of Base MalwareBase malware in a family is most similar to rest allwith highest sum of score using pairwise alignment(SoP[3]).M1 M2 M3 M4 SoPM2Malaviya National Institute of Technology, JaipurM1 - 7 -2 1 6M2 7 - -3 0 4M3 -2 -3 - 1 -4M4 1 0 1 - 2is Base Malware Score MatrixM1M3M4M2M1• Mi – ith Malware instance
  15. 15. Implementation MethodMetamOrphic Malware ExploratioN TechniqueUsing MSA (MOMENTUM) demonstrate theapplicability of Bioinformatics Techniques formetamorphic malware analysis and detection.Malaviya National Institute of Technology, JaipurTwo phase of MOMENTUN are:Analysis of Metamorphism in Tools/Real MalwareSignature Modelling and Testing
  16. 16. MOMENTUM [1/2]Metamorphic Families(Virus Tools and Real Malware)Intra-Family pair-wise AlignmentMalaviya National Institute of Technology, JaipurDistance Matrix Base file Alignments of twofilesMetamorphic?Inter-Family pair-wiseAlignmentFamiliesOverlap ?Obfuscation ?• Flow diagram for metamorphism analysis
  17. 17. MOMENTUM [2/2]Training Set Testing SetDivide data set in two partsMalaviya National Institute of Technology, JaipurExtract GroupSignatureTesting with single and group signaturesSingle SignatureScan LogsThreshold Threshold• Diagram depicts Signature Modelling and Testing
  18. 18. MSA SignatureMSA signature (single signature) is a sequence ofpreserved mnemonics in alignment.M1 M2 M3 M4 M5 MSA Signpush push - - push pushMtpushMalaviya National Institute of Technology, JaipurMnemonic that appears more than 50% in a rowis included in MSA signature.- - jump jump jump jumpmov mov - lea xorcall call call call call callpush mov mov - mov mov• Mi – ith Malware instance and Mt – Test Samplejumpleacallpush
  19. 19. Group SignatureGroup signature is extracted from single signaturefor each subgroup.Sub groups are formed using evolutionary relationship.Single signature is extracted for each subgroup andcombined in the form of wildcard.Malaviya National Institute of Technology, Jaipurcombined in the form of wildcard.DiagramSign1 Sign2 Sign3 Sign4 Sign5 Group Signpush push - - push pushjz jz jump jump jump jump|jzmov mov - lea xor mov|lea|xorcall call call call call call- mov mov - push mov|push• Signi – Signature for ith sub-group in a familyMtpushjzleacallpush
  20. 20. ThresholdSign0 B B M M Score. . . . . .Benign MalwareMalaviya National Institute of Technology, JaipurThreshold0 Bmin Bmax Mmin MmaxScoreWhere:Bmin Benign with minimum scoreBmax Benign with maximum scoreMmin Malware with minimum scoreMmax Malware with maximum scoreThreshold (Bmax + Mmin) /2 , ( Threshold > Bmax )
  21. 21. Dataset [1/2]Dataset Description:Type Source #Family #instancesSyntheticNGVCK, PSMPC, G2,MPCGEN46 1051User AgenciesMalaviya National Institute of Technology, Jaipur* consists of unknown viruses (in test set).Dataset is equally divided into training andtesting set.RealUser Agencies52 + 1* 1209VxHeavensBenign System32,Cygwin etc. 1 1501*
  22. 22. Dataset [2/2]All samples are in Portable Executables (PE)format.Samples are unpacked usingDynamic unpacker (EtherUnpack [7] )Malaviya National Institute of Technology, JaipurDynamic unpacker (EtherUnpack [7] )Signature based unpacker (GUNPacker [10])Malware families are created from combinedscanned results of 14 antiviruses.Benign samples are also scanned.
  23. 23. Result for Intra Family0. National Institute of Technology, JaipurNon zero values indicates presence of metamorphism insynthetic data.Levenshtein distance is high due to junk code insertion.Inspite of high values of global distance, local distances arelow in most of the samples. This indicates presence of similarregions in code.0NGVCK PSMPC G2 MPCGEN• Average distance is between 0 to 1
  24. 24. Result for Inter Family0. National Institute of Technology, JaipurDistance is less than intra family distance. This indicatesmost of malware share some base code.Levenshtein distance is higher because of change infunctionality.00.1NGVCK PSMPC G2 MPCGEN VX HEAVENS• Average distance is between 0 to 1
  25. 25. Comparative AnalysisVIRUS TYPEReplacements/AlignmentAvg. SoD OBFUSCATIONNGVCK 47 1.03 Average SimpleG2 3 1.45 Low SimpleMPCGEN 31 0.61 Average SimpleMalaviya National Institute of Technology, JaipurMPCGEN 31 0.61 Average SimplePSMPC 1 1.35 Low WeakVx-Heavens 122 8.3 Large ComplexViruses generated using tools belong to same family.Families of real malware are distinct.In PSMPC loop and jump instructions contribute forobfuscation this increases the distance between samples.NGVCK viruses overlaps with real malware (Savior).• SoD – Sum of distances of a family with rest other family
  26. 26. Detection Results0. SingleGroup SignatureMalaviya National Institute of Technology, Jaipur95.5% of malware is detected with MSA signature, detectionwith Group signature is 72.4% .53% of benign is falsely detected as malware with MSAsignature due to loss mnemonics used for mutation inmalware.Group signature preserves point of mutation that is absent inbenign samples.00.1TPR FPR
  27. 27. MOMENTUM with Antiviruses2030405060708090DetectionRateMalaviya National Institute of Technology, JaipurMOMENTUM (group signature) is found to be comparableto best ant-viruses.Out of 35 undetected malware withantiviruses, MOMENTUM could detect 20 malware.01020
  28. 28. Scope for ImprovementInstead of same mismatch score, computeweighted score for each pair of mnemonics usingfrequency of mismatches.In the alignment, operand part can be consideredto verify actual changes (replacement/gap).Malaviya National Institute of Technology, Jaipurto verify actual changes (replacement/gap).This can fetch the way morpher preservesfunctionality.
  29. 29. List of Publications[1] Vinod P., V.Laxmi, M.S.Gaur, Grijesh ChauhanDetecting Malicious Files using Non-Signature based Methods,(To appear) Oxford Computer Journal.[2] Vinod P., V.Laxmi, M.S.Gaur, Grijesh ChauhanMalware Detection using Non-Signature based Method, InMalaviya National Institute of Technology, JaipurMalware Detection using Non-Signature based Method, InProceeding of IEEE International Conference on NetworkCommunication and Computer-ICNCC 2011, pp-427-43, DOI:978-1-4244-9551-1/11.
  30. 30. References[1] E.Karim, A.Walenstein, A.Lakhotia, “Malware Phylogeny using Permutationof code”, In Proceedings of EICAR 2005, pp 167-174[2] M.R. Chouchane and A. Lakhotia , “Using engine signature to detectmetamorphic malware”, In Proceedings of the 4th ACM workshop onRecurring malcode, WORM 06, 2006,73-78.Malaviya National Institute of Technology, Jaipur[3] Mona Singh, " Multiple Sequence Alignment ", Lecture (Last viewed on 14-6-2011)[4] Mona Singh, " Phylogenetics ", Lecture (Last viewed on 14-6-2011)[5] T. Smith and M. Waterman, “Identification of Common MolecularSubsequences”, Journal of Molecular Biology, pp 195-197, 1987[6] Mark Stamp, Wing Wong. "Hunting for metamorphic engines". Journal inComputer Virology, 2(3):211-229
  31. 31. References[7] Ether for Malware Unpacking: viewed on 14-6-2011)[8] Jian Li, Jun Xu, Ming Xu, HengiLi Zhao, Ning Zheng, “MalwareObfuscation Measuring via Evolutionary Similarity”, In Proceedings of IEEEInt. Conference on Future Information Network 2009.Malaviya National Institute of Technology, Jaipur[9] Larkin MA et al, " Clustal W and Clustal X version 2.0 ".Bioinformatics, 23, 2947-2948, 2007.[10] GUnPacker : viewed on 14-6-2011)
  32. 32. Thanks!Malaviya National Institute of Technology, Jaipur