Using a Hex Editor and your Brain
3rd CodeEngn ReverseEngineering Seminar2009.07.04
3rd CodeEngn ReverseEngineering Seminar
} When the program that reads the File is not
accessible(i.e. Firmware)
} When the program is in your possession, but is
t...
} Search Wotsit.org
} Study the general structures of File Formats
} Using a Hex Editor, try to match the structure of
the...
} Case I : Cannot write but readable
} Case II : readable/writable
} Case III : Only 1 sample, cannot read/write
(“can rea...
3rd CodeEngn ReverseEngineering Seminar
} GRAIS(Game Resource Archive Identity Strings)
} Offsets
} Size Fields
} Number of Files
} Filenames
3rd CodeEngn Reverse...
} The file that contains the resources are bigger
than the other files
} Most values are stored in a 4 byte field
} There ...
OK… So I’m looking at a bunch of Hex Bytes… Now What???
3rd CodeEngn ReverseEngineering Seminar
} File Size = 1299033
} File size fits in 3 bytes(0~16777216)
} Most fields are stored as 4 bytes
} All offset and size fi...
GRAIF Possibly an offset ???
3rd CodeEngn ReverseEngineering Seminar
Is indeed an offset J
Looks like a directory…
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
Possibly the number of Directory entries in the
Directory?
3rd CodeEngn ReverseEngineering Seminar
0x85E / 0x15 = 0x66
Verified! J
3rd CodeEngn ReverseEngineering Seminar
} First field is increasing in each successive entries
→ Possibly a File Offset?
} If first theory is correct, second fiel...
Confirmed! J
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
Files extracted! J Moving on…
3rd CodeEngn ReverseEngineering Seminar
} First File Offset
} Archive Name
} Filename Offset
} Filename Directory Offset
} Total File Data Size
} Total Directory ...
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
} Max 40 Chars
3rd CodeEngn ReverseEngineering Seminar
} 11/1
} 12/6
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
} Possibly some kind of unique identifier
3rd CodeEngn ReverseEngineering Seminar
Size Meaning
2 Date
40 Message
2 Year
4 Identifier
Chunk Structure
3rd CodeEngn ReverseEngineering Seminar
You know what to do next! J
3rd CodeEngn ReverseEngineering Seminar
FileType Unknown Data Size
Checksum
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
} All strings are absolutely
Incomprehensible
} → Possibly Compressed,
Encrypted
} What to do next?
→ Frequency Analysis! ...
What can we understand from this?
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
Highly Symmetric → Pattern
→ Sign of a Weak Cipher !!!
3rd CodeEngn ReverseEngineering Seminar
} Monoalphabetic
} Polyalphabetic
} Transposition
} Monoalphabetic
(Frequency Doesn’t Change)
} Polyalphabetic
} Transposi...
} Why? Because it is the simplest encryption
algorithm with a key to implement! J
} It also matches our theory of Polyalph...
Key Length : 32 bytes
3rd CodeEngn ReverseEngineering Seminar
} Key is XORed
} Key is Added
} Key is Subtracted
} Key is RORed
} Key is ROLed
} …
} …
Key Found! J
RIMIDALV1AIDEMITLUMXO...
3rd CodeEngn ReverseEngineering Seminar
} Strings are weirdly distorted
} 1 byte trash in the Middle of
Strings
} What could this mean???
} What to do next?
→ Cal...
→ File is possibly Compressed
3rd CodeEngn ReverseEngineering Seminar
FileType Uncompressed
Size
Compressed
Size
Checksum
} Compression Ratio : 1.8
3rd CodeEngn ReverseEngineering Seminar
} Run Length Encoding
} Arithmetic coding
} Huffman
} LZ77
} LZ78
} Run Length Encoding
(Repeating bytes exist)
} Arithmet...
} the_rain_<3,3>Sp<9,4>falls
} <3,3> : go 3 bytes back, copy 3 bytes from
there and paste at current location.
} The_rain_...
} Every n bytes there exists a Header byte
} HeaderByte probably tells the number of bytes
until the next HeaderByte, and ...
} We know that the Header byte contains the
following information :
1. The number of bytes until the next Header
2. The po...
} Compile Header Bytes with the same length(9)
} Use your brain and think. ‘Why do all those
different bytes end up in the...
} Bit 0 means length = 2
} Bit 1 means length = 1
ex) 0xBF = 10111111 = 7*1 + 1*2 = 9
0xAC = 10101100 = 4*1 + 4*2 = 12
} M...
} Offsets found! J
} Verify for 5 or more header bytes just to make sure.
} The theory is correct.
} Something new learned...
} What we already know about the CBID :
1. Contains the size of the compressed bytes
2. Contains the position of the compr...
} “Firmw.are V..io.” ≃ “Firmware Version”
} ∴ C8B0 → 3 bytes
} “H{.igkei” = “Helligkei”
} ∴ 7B80 → 3 bytes
} “Cre.a..Folde...
∴ C8B0 → 3 bytes
∴ 7B80 → 3 bytes
∴ B3A0 → 3 bytes
} The last nibble is all 0 for all CBIDs.
} (Compressed bytes length = ...
} The compressed bytes are still unknown and are
filled with zeroes
3rd CodeEngn ReverseEngineering Seminar
} We know that it represents the position of the
last occurred matching bytes
} We know by the length field that the minim...
} “a…save” = “and save”
} Compressed bytes = “nd ”
} Last occurred at 0x3826A and 0x48 bytes before
the compressed bytes
}...
} “DIS…NECTED” = “DISCONNECTED”
} Compressed bytes = “CON”
} Last occurred at 0x53B1 and 0xB bytes before
the compressed b...
} “al…dy exists” = “already exists”
} Compressed bytes = “rea”
} Last occurred at 0x13769 and 0x6B9 bytes
before the compr...
} 0x258 ≠ 0x3826A, 0x48
} But 0x258 is very close to 0x26A !
} In fact, 0x757 is also very close to 0x769, and so
is 0x39F...
} Using the Above facts, uncompress the entire
file and dump it into a new file.
3rd CodeEngn ReverseEngineering Seminar
} All strings are neatly displayed! Which means
the whole file is uncompressed. J
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
3rd CodeEngn ReverseEngineering Seminar
안기찬 ; Ahn Ki-Chan
wringer@paran.com
www.Externalist.org
Upcoming SlideShare
Loading in …5
×

[2009 CodeEngn Conference 03] externalist - Reversing Undocumented File Formats using a Hex Editor and your Brain

658 views

Published on

2009 CodeEngn Conference 03

문서화되지 않은 파일포맷을 리버싱하는 방법은 크게 2가지가 있다. 첫번째는 해당파일을 읽는 프로그램을 리버싱하는 방법과 두번째는 파일 자체를 리버싱하는 방법이 있으며 이번 발표에서는 2번째 방법에 대해 설명하려고 한다.

http://codeengn.com/conference/03

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
658
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

[2009 CodeEngn Conference 03] externalist - Reversing Undocumented File Formats using a Hex Editor and your Brain

  1. 1. Using a Hex Editor and your Brain 3rd CodeEngn ReverseEngineering Seminar2009.07.04
  2. 2. 3rd CodeEngn ReverseEngineering Seminar
  3. 3. } When the program that reads the File is not accessible(i.e. Firmware) } When the program is in your possession, but is too hard to reverse(packed, protected etc.) } When you don’t know how to reverse executable files, but still want to reverse a file format } Sometimes, reversing a file itself is faster than reversing the executable file(Game Archives) 3rd CodeEngn ReverseEngineering Seminar
  4. 4. } Search Wotsit.org } Study the general structures of File Formats } Using a Hex Editor, try to match the structure of the data with the structures you already know } “Look” at the Hex Bytes and try to find patterns } Make Assumptions and verify } If the theory proves to be correct, make more assumptions based on the new facts until the whole(or part) of the file format is revealed 3rd CodeEngn ReverseEngineering Seminar
  5. 5. } Case I : Cannot write but readable } Case II : readable/writable } Case III : Only 1 sample, cannot read/write (“can read/write” means the program that reads/writes to the file is accessible) 3rd CodeEngn ReverseEngineering Seminar
  6. 6. 3rd CodeEngn ReverseEngineering Seminar
  7. 7. } GRAIS(Game Resource Archive Identity Strings) } Offsets } Size Fields } Number of Files } Filenames 3rd CodeEngn ReverseEngineering Seminar
  8. 8. } The file that contains the resources are bigger than the other files } Most values are stored in a 4 byte field } There is usually at least 1 meaningful field in the beginning of the file } Strings can be null-terminated, or fixed-length in which case the string is padded with nulls 3rd CodeEngn ReverseEngineering Seminar
  9. 9. OK… So I’m looking at a bunch of Hex Bytes… Now What??? 3rd CodeEngn ReverseEngineering Seminar
  10. 10. } File Size = 1299033 } File size fits in 3 bytes(0~16777216) } Most fields are stored as 4 bytes } All offset and size fields will end with a NULL byte 3rd CodeEngn ReverseEngineering Seminar
  11. 11. GRAIF Possibly an offset ??? 3rd CodeEngn ReverseEngineering Seminar
  12. 12. Is indeed an offset J Looks like a directory… 3rd CodeEngn ReverseEngineering Seminar
  13. 13. 3rd CodeEngn ReverseEngineering Seminar
  14. 14. Possibly the number of Directory entries in the Directory? 3rd CodeEngn ReverseEngineering Seminar
  15. 15. 0x85E / 0x15 = 0x66 Verified! J 3rd CodeEngn ReverseEngineering Seminar
  16. 16. } First field is increasing in each successive entries → Possibly a File Offset? } If first theory is correct, second field has a high chance to be the File Size 3rd CodeEngn ReverseEngineering Seminar
  17. 17. Confirmed! J 3rd CodeEngn ReverseEngineering Seminar
  18. 18. 3rd CodeEngn ReverseEngineering Seminar
  19. 19. Files extracted! J Moving on… 3rd CodeEngn ReverseEngineering Seminar
  20. 20. } First File Offset } Archive Name } Filename Offset } Filename Directory Offset } Total File Data Size } Total Directory Size } Archive Size } Number Of Directories } Directory Offset } File Extension / Type } File ID } Archive Version } Filename Length } Decompressed File Size } Checksum } Timestamp 3rd CodeEngn ReverseEngineering Seminar
  21. 21. 3rd CodeEngn ReverseEngineering Seminar
  22. 22. 3rd CodeEngn ReverseEngineering Seminar
  23. 23. 3rd CodeEngn ReverseEngineering Seminar
  24. 24. 3rd CodeEngn ReverseEngineering Seminar
  25. 25. } Max 40 Chars 3rd CodeEngn ReverseEngineering Seminar
  26. 26. } 11/1 } 12/6 3rd CodeEngn ReverseEngineering Seminar
  27. 27. 3rd CodeEngn ReverseEngineering Seminar
  28. 28. } Possibly some kind of unique identifier 3rd CodeEngn ReverseEngineering Seminar
  29. 29. Size Meaning 2 Date 40 Message 2 Year 4 Identifier Chunk Structure 3rd CodeEngn ReverseEngineering Seminar
  30. 30. You know what to do next! J 3rd CodeEngn ReverseEngineering Seminar
  31. 31. FileType Unknown Data Size Checksum 3rd CodeEngn ReverseEngineering Seminar
  32. 32. 3rd CodeEngn ReverseEngineering Seminar
  33. 33. } All strings are absolutely Incomprehensible } → Possibly Compressed, Encrypted } What to do next? → Frequency Analysis! J 3rd CodeEngn ReverseEngineering Seminar
  34. 34. What can we understand from this? 3rd CodeEngn ReverseEngineering Seminar
  35. 35. 3rd CodeEngn ReverseEngineering Seminar
  36. 36. 3rd CodeEngn ReverseEngineering Seminar
  37. 37. Highly Symmetric → Pattern → Sign of a Weak Cipher !!! 3rd CodeEngn ReverseEngineering Seminar
  38. 38. } Monoalphabetic } Polyalphabetic } Transposition } Monoalphabetic (Frequency Doesn’t Change) } Polyalphabetic } Transposition (Frequency Doesn’t Change) 3rd CodeEngn ReverseEngineering Seminar
  39. 39. } Why? Because it is the simplest encryption algorithm with a key to implement! J } It also matches our theory of Polyalphabetic Cipher, or more like, a Vigenère cipher char[] key = “1234567890” for(int i=0; i<filesize; i++){ encrypted_data[i] = data[i]^key[i % strlen(key)] } 3rd CodeEngn ReverseEngineering Seminar
  40. 40. Key Length : 32 bytes 3rd CodeEngn ReverseEngineering Seminar
  41. 41. } Key is XORed } Key is Added } Key is Subtracted } Key is RORed } Key is ROLed } … } … Key Found! J RIMIDALV1AIDEMITLUMXOBEKUJSOHCRA 3rd CodeEngn ReverseEngineering Seminar
  42. 42. 3rd CodeEngn ReverseEngineering Seminar
  43. 43. } Strings are weirdly distorted } 1 byte trash in the Middle of Strings } What could this mean??? } What to do next? → Calculate the Entropy 3rd CodeEngn ReverseEngineering Seminar
  44. 44. → File is possibly Compressed 3rd CodeEngn ReverseEngineering Seminar
  45. 45. FileType Uncompressed Size Compressed Size Checksum } Compression Ratio : 1.8 3rd CodeEngn ReverseEngineering Seminar
  46. 46. } Run Length Encoding } Arithmetic coding } Huffman } LZ77 } LZ78 } Run Length Encoding (Repeating bytes exist) } Arithmetic coding (No text remains) } Huffman (No text remains) } LZ77 } LZ78 3rd CodeEngn ReverseEngineering Seminar
  47. 47. } the_rain_<3,3>Sp<9,4>falls } <3,3> : go 3 bytes back, copy 3 bytes from there and paste at current location. } The_rain_in_Sp<9,4>falls } <9,4> : go 9 bytes back, copy 4 bytes from there and paste at current location. } The_rain_in_Spain_falls Compressed bytes info data(CBID) contains <last occurred matching sequence position, length> 3rd CodeEngn ReverseEngineering Seminar
  48. 48. } Every n bytes there exists a Header byte } HeaderByte probably tells the number of bytes until the next HeaderByte, and the position of the CBID(s) } HeaderByte 0xFF means the next 8 bytes are uncompressed 3rd CodeEngn ReverseEngineering Seminar
  49. 49. } We know that the Header byte contains the following information : 1. The number of bytes until the next Header 2. The position of the CBID(s) } Start attacking no.1 theory by collecting samples. FB : 9 EF : 9 BE : 10 DF : 9 FF : 8 EA : 11 AB : 11 AF : 10 CF : 10 EB : 10 0F : 12 F9 : 10 FE : 9 EE : 10 BF : 9 FD : 9 3rd CodeEngn ReverseEngineering Seminar
  50. 50. } Compile Header Bytes with the same length(9) } Use your brain and think. ‘Why do all those different bytes end up in the same length?’ } Make assumptions. (Length info is contained in High nibble | Low nibble | bitfields | etc…) } Verify your theory by testing it on other samples. FB : 9 EF : 9 DF : 9 FE : 9 BF : 9 3rd CodeEngn ReverseEngineering Seminar
  51. 51. } Bit 0 means length = 2 } Bit 1 means length = 1 ex) 0xBF = 10111111 = 7*1 + 1*2 = 9 0xAC = 10101100 = 4*1 + 4*2 = 12 } Moreover, 0 is a CBID, while 1 is a normal uncompressed byte. Read from right to left. } 0xDF = 11011111 } = 5 uncompressed , CBID, 2 uncompressed } “media”,xD4xA0, “OW” 3rd CodeEngn ReverseEngineering Seminar
  52. 52. } Offsets found! J } Verify for 5 or more header bytes just to make sure. } The theory is correct. } Something new learned : CBID is 2 bytes 3rd CodeEngn ReverseEngineering Seminar
  53. 53. } What we already know about the CBID : 1. Contains the size of the compressed bytes 2. Contains the position of the compressed bytes } Start attacking no.1 theory by collecting samples. 3rd CodeEngn ReverseEngineering Seminar
  54. 54. } “Firmw.are V..io.” ≃ “Firmware Version” } ∴ C8B0 → 3 bytes } “H{.igkei” = “Helligkei” } ∴ 7B80 → 3 bytes } “Cre.a..Folder” = Create Folder } ∴ B3A0 → 3 bytes 3rd CodeEngn ReverseEngineering Seminar
  55. 55. ∴ C8B0 → 3 bytes ∴ 7B80 → 3 bytes ∴ B3A0 → 3 bytes } The last nibble is all 0 for all CBIDs. } (Compressed bytes length = Last Nibble + 3) ? } Verify by parsing the file using that assumption. } Matches ! J } Theory is Correct. Uncompressed File Size 3rd CodeEngn ReverseEngineering Seminar
  56. 56. } The compressed bytes are still unknown and are filled with zeroes 3rd CodeEngn ReverseEngineering Seminar
  57. 57. } We know that it represents the position of the last occurred matching bytes } We know by the length field that the minimum length of the matching bytes is 3 } Collect samples where you can guess the compressed bytes, in which the bytes identical to the resulting bytes are located somewhere before the compressed bytes, where you can see them. 3rd CodeEngn ReverseEngineering Seminar
  58. 58. } “a…save” = “and save” } Compressed bytes = “nd ” } Last occurred at 0x3826A and 0x48 bytes before the compressed bytes } CBID position field = 0x258 3rd CodeEngn ReverseEngineering Seminar
  59. 59. } “DIS…NECTED” = “DISCONNECTED” } Compressed bytes = “CON” } Last occurred at 0x53B1 and 0xB bytes before the compressed bytes } CBID position field = 0x39F 3rd CodeEngn ReverseEngineering Seminar
  60. 60. } “al…dy exists” = “already exists” } Compressed bytes = “rea” } Last occurred at 0x13769 and 0x6B9 bytes before the compressed bytes } CBID position field = 0x757 3rd CodeEngn ReverseEngineering Seminar
  61. 61. } 0x258 ≠ 0x3826A, 0x48 } But 0x258 is very close to 0x26A ! } In fact, 0x757 is also very close to 0x769, and so is 0x39F and 0x3B1 } Carefully observing the difference between the 2 numbers, we realize that the difference is constant ! } The difference between the two numbers is 0x12, therefore, the format of the CBID is } <Offset of the last occurred identical byte sequence - 0x12(low 3 nibbles), Length – 3> 3rd CodeEngn ReverseEngineering Seminar
  62. 62. } Using the Above facts, uncompress the entire file and dump it into a new file. 3rd CodeEngn ReverseEngineering Seminar
  63. 63. } All strings are neatly displayed! Which means the whole file is uncompressed. J 3rd CodeEngn ReverseEngineering Seminar
  64. 64. 3rd CodeEngn ReverseEngineering Seminar
  65. 65. 3rd CodeEngn ReverseEngineering Seminar 안기찬 ; Ahn Ki-Chan wringer@paran.com www.Externalist.org

×