3. • The process of reassembling files from
disk fragments in the absence of
metadata.
What is file carving?
4. • Accidental user deletions.
• Intentional user deletions.
• Malware.
When would we need file carving?
5. Using .jpeg file as an example :
•Find header (FF D8).
•Know footer pair (FF D9).
•Find all contiguous data.
Traditional file carving method
6. •Fragmentation.
•Doesn’t work without exact header and
footer information.
•Doesn’t work with all file types.
o focuses on documents of forensic interest.
o binary executables not included.
Problems with traditional method
7. • Recover Executable Linkable Format
(ELF) file e from disk image D
• D strictly consists of file content blocks
• Assume D is an EXT2 file system, block
size 4k
Bin-carver overview -1
8. • File content has not been overwritten.
• File content is stored in increasing order.
• ELF file e has n blocks in the disk.
want to link these n blocks together
utilizing internal graph node logic.
Bin-carver overview -2
12. • ELF-header scanner.
o scan all possible ELF headers hi using ELF-file
magic value.
• Block node linker.
o scans disk image, identifies nodes and links them.
• Conflict-node resolver.
o removes conflict nodes and outputs ELF-file ei.
Components
13. • Headers hold a “road map” describing
ELF file organization.
• Searching for the magic number sequence
7f 45 4c 46 allows to locate headers,
telling how to traverse all other sections.
Scanner -1
14. Each header is 52k and contains:
• Program header table (PHT)
o array of program headers
• Section header table (SHT)
o array of section headers
Scanner -2
15. • Usually located at end of ELF file.
o can serve as a footer because of this.
• Since A(footer) > A(hi) can start our search
at the 0x14 disk block.
• Gives a multitude of other constraints that
allow to calculate the location of the
footer.
Searching SHT
16. •Locates segments that create memory
image of the program.
•Each program header is 32 bytes.
•Usually starts right after ELF headers.
osame 4k block.
Searching PHT
17. •From program header, infer vase virtual
address of image file.
•Keep iterating and build the road map.
•The goal is to find every fill this road
map with content (bi).
Searching PHT
18. • With no fragmentation, job is done.
• But, with any garbage gap, this approach
would fail.
• So how to link each individual bi if the
disk is fragmented?
Finished?
19. • Have to logically connect bi and bj .
• Explore the caller-callee relationship:
• Fill block place of bcaller and bcallee
o find address
• Logically link them together.
o function prologue signature (local calls)
o PLT instruction sequence (library calls)
Block-node linker -1
20. • On a library call
o Use PLT block number as an anchor.
o Use this anchor to identify absolute block number of the
caller block.
• On a local call
o Only determines distance.
o Only works with blocks starting with e8 (CALL opcode).
• Most cases library calls are used to resolve
block numbers.
Block-node linker -2
21. • A particular placeholder i could have
several candidates.
• To eliminate redundant placeholders:
o Use identified non-conflict nodes
o Explore logic connections
o Resolve node
o Iterate through until a fixed point is reached
Conflict-node resolver -1
22. • Block-node linker only focuses on linking
code blocks. Conflict-node resolver
handles other data blocks (.data,
.debug).
Conflict-node resolver -2
23. To retrieve data blocks:
• Treat data sections as a block between the ELF header and
the first block of code section.
• Resolvers explores constraints defined in PHT and SHT.
• Worst case scenario: data section does not have identifiable
sections and we must use dynamic execution to eliminate
bogus permutations.
o Essentially, if the recovered binary file doesn’t crash, it
may have been recovered successfully.
Conflict-node resolver -3
24. • Comparisons were intended to be made
to other similar tools, both Foremost and
Scalpel do not support carving for
fragmented ELF binary files.
Evaluation - Comparison
26. • All files are ELF binaries.
o worst case, high false positive rates.
o addition of heterogeneous data irrelevant.
• Performance of algorithm is invariant to
size of the disk.
• Performance relies on number of files to be
recovered.
Evaluation -2
27. • To evaluate accuracy, need to prove the
recovered files are true elf files.
• Need to create an MD5 hash of first block
and every individual block for each true
ELF binary to detect true data in worst
case fragmentation scenario.
Evaluation -3
28. Identification rate:
• Shows portion that can be identified no
matter how fragmented the disk is.
o must be able to match hash values
Recovery Rate
• Valid files in the system that were
identified and recovered.
Effectiveness -1
29. Overall, very effective. On average:
• Identification rate of 96.3%
• Recovery rate of 93.1%
Effectiveness -2
31. • All performance slowdowns occur during
linker and resolver phases.
• Large gaps hurt performance, and the
large number of caller-callee instructions
cause performance penalties.
Runtime Analysis -1
33. Conclusion
• Bin-Carver, a tool for dissecting, map- ping, and recovering
binary executable files from raw binary data.
• Bin-Carver is extremely accurate, and much better than all
the existing file carving techniques when recovering binary
files with fragmentations.
• Bin-Carver also provides a useful complement to the more
traditional header-footer pairing approach for file carving to
gain more complete disk image recovery.
34. References
1. A. Pal, K. Shanmugasundaram, N. Memon, Automated reassembly of fragmented images, in: Proceedings of the 2003
International Conference on Multimedia and Expo - Volume 2, ICME ’03, IEEE Computer Society, Washington, DC, USA,
2003, pp. 625–628.
2. A.Pal, N.Memon, The evolution of file carving, Signal Processing Magazine, IEEE 26 (2) (2009) 59 –71.
3. M.Karresand, N.Shahmehri, File type identification of data fragments by their binary structure, in: Information Assurance
Workshop, 2006 IEEE, 2006, pp. 140 –147.
4. M. McDaniel, M. H. Heydari, Content based file type detection algorithms, in: Proceedings of the 36th Annual Hawaii
International Conference on System Sciences (HICSS’03) - Track 9 - Volume 9, 2003.
5. M. Karresand, N. Shahmehri, Oscar – file type identification of binary data in disk clusters and ram pages, in: Security
and Privacy in Dynamic Environments, Vol. 201 of IFIP International Federation for Information Processing, 2006, pp. 413–
424.
6. S.Moody, R.Erbacher, Sadi-statistical analysis for data type identification, in: Systematic Approaches to Digital Forensic
Engineering, 2008. SADFE ’08. Third International Workshop on, 2008, pp. 41 –54.