Your SlideShare is downloading. ×
0
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
MR201403 consideration and evaluation of using fuzzy hashing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MR201403 consideration and evaluation of using fuzzy hashing

195

Published on

‘fuzzy hashing’ was introduced in 2006 by Jesse Kornblum. …

‘fuzzy hashing’ was introduced in 2006 by Jesse Kornblum.

In malware analysis fuzzy hashing algorithms such as ssdeep are being introduced in recent years.

(IMHO) However, we don’t consider the effective usage of them enough • In this slides, we evaluate an effectiveness of classification of malware similarity by fuzzy hashing Background and purpose.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
195
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FFRI,Inc. Fourteenforty Research Institute, Inc. FFRI, Inc http://www.ffri.jp Monthly Research Consideration and evaluation of using fuzzy hashing Ver2.00.01
  • 2. FFRI,Inc. • Background and purpose • Basis of fuzzy hashing • An experiment • The result • Consideration Agenda 2
  • 3. FFRI,Inc. • ‘fuzzy hashing’ was introduced in 2006 by Jesse Kornblum – http://dfrws.org/2006/proceedings/12-Kornblum.pdf • In malware analysis fuzzy hashing algorithms such as ssdeep are being introduced in recent years • (IMHO) However, we don’t consider the effective usage of them enough • In this slides, we evaluate an effectiveness of classification of malware similarity by fuzzy hashing Background and purpose 3
  • 4. FFRI,Inc. • In general, cryptographic hashing like MD5 is popular and it has the attributes as follows: (cf. http://en.wikipedia.org/wiki/Cryptographic_hash_function) – it is easy to compute the hash value for any given message – it is infeasible to generate a message that has a given hash – it is infeasible to modify a message without changing the hash – it is infeasible to find two different messages with the same hash • Cryptographic hashing is often used for identify the same files • On the other hand, it is unsuitable to identify similar files because digests are completely different even if 1 bit is altered in the other file • In DFIR, this need exists and fuzzy hashing was developed to solve this problem Basis of Fuzzy hashing(1/4) 4
  • 5. FFRI,Inc. • Fuzzy hashing = Context Triggered Piecewise Hashing(CTPH) = Piecewise hashing + Rolling hashing • Piecewise Hashing – Dividing message into N-block and calculating hash value of each blocks • Rolling Hash – A method to calculate hash value of sub-message(position 1-3, 2-4…) fast – In general, if calculated values are the same, it is a high probability that original messages are identical Basis of Fuzzy hashing(2/4) 5 1 2 3 4 5 6 7 8 a b c d e f g h Calculating a hash value for each N chars(N=3)
  • 6. FFRI,Inc. • Fuzzy hashing(CTPH) – When rolling hash generates a specific value at any position, it calculates cryptographic hash value of the partial message from the beginning to the position – Generate a hash value by concatenating all (partial) hashes Basis of Fuzzy hashing(3/4) 6 1 2 3 4 5 6 7 8 a a a b b b c c ①calculating rolling hash, and the result is the specific value (trigger) ②calculating a hash value of this block by cryptographic hash ③calculating rolling hash value from position 4(trying to determine a next block)
  • 7. FFRI,Inc. • Fuzzy hashing(CTPH) – With almost identical messages, it would calculate a hash value of identical partial message stochastically – We can identify partial matches between similar messages Basis of Fuzzy hashing(4/4) 7 1 2 3 4 5 6 7 8 a a a b b b c c 1 2 3 4 5 6 7 8 b b b a a a d d a823 928c 817d 928c a823 1972 Hash values for each block 238c7d 8c2372 Hash value for entire message
  • 8. FFRI,Inc. • In general, usage of fuzzy hashing is proposed as follows: – Determining similar files(i.e. almost identical but MD5s aren’t matched) – Matching partial data in files • This time, we evaluate determining similar malware by fuzzy hashing • We make it clear “how effective it is in actually” and “what we should consider if we applying it” for the above An experiment(1/2) 8
  • 9. FFRI,Inc. • Preparation – Preparing 2,036 unique malware files in MD5s collected by ourselves – Calculating(fuzzy) hash values by ssdeep and similarity of all of each files • nCr: 2,036C2 = 2,071,630 combinations • Determining similar malware – Extracting all the pairs whose similarities are 50%-100% – Determining if the detection name of files in a pair is matched for each similarity threshold An experiment(2/2) 9 A B C … A 90 82 54 B 76 62 C 46 … similarity of each malware files(%) 80%+ ・A-B ・A-C 80%+ ・(A)Trojan.XYZ – (B)Trojan.XYX ・(A)Trojan.XYZ – (C)WORM.DEA Matched Not matched
  • 10. FFRI,Inc. • The higher the threshold is, the higher matching rate of detection name we get – Up to the threshold of 90% it keeps around 50-60% matching rate The result 10 Thenumberofpairswhose similarityareabovethreshold Threshold of similarity(%)
  • 11. FFRI,Inc. • Dividing “matched” pairs into a group who has “generic” in its name and the others • “matched(Generic)%” shows the same trend with the matching rate above -> The higher the threshold is, the more malware are detected as “generic” The rate detected by the name “Generic”? 11 Thenumberofpairswhose similarityareabovethreshold Threshold of similarity(%)
  • 12. FFRI,Inc. • A meaning of the result depends on if the AVV uses fuzzy hashing for generic detection – If they use fuzzy hashing for generic detection • The result is natural – If not • By using fuzzy hashing, we may obtain a similar result to the generic detection • If we use fuzzy hashing for generic detection, 90%+ similarity might be required with known malware (fuzzy) hash values Consideration 12
  • 13. FFRI,Inc. • E-Mail: research-feedback@ffri.jp • twitter: @FFRI_Research Contact Information 13

×