RecuperaBit: Forensic File System
Reconstruction Given Partially
Corrupted Metadata
C A N D I D AT E Andrea Lazzarotto
S U P E RV I S O R Ch. Prof. Riccardo Focardi
—
Ca’ Foscari
Dorsoduro 3246
30123 Venezia
Università
Ca’Foscari
Venezia
F I L E S Y S T E M R E C O N S T R U C T I O N
Work:
• File system reconstruction from damaged metadata
• Detection of partition geometry
• Tests against similar software
Motivation:
• File system analysis is used in many investigations
• Carving does not provide context
• File systems may be damaged
C O N T E N T S
1. F O R E N S I C F I L E S Y S T E M A N A LY S I S
Problem definition and N T F S features
2. F I L E S Y S T E M R E C O N S T R U C T I O N A L G O R I T H M
Tree reconstruction and partition detection
3. S O F T WA R E I M P L E M E N TAT I O N
Test results
F O R E N S I C F I L E S Y S T E M
A N A LY S I S
P R O B L E M D E F I N I T I O N
Problem (Forensic File System Reconstruction). Develop an algo-
rithm that reconstructs the directory structure of one or more
types of file systems.
I N P U T
1. Bitstream copy of drive
2. File system types to search
O U T P U T
Files divided in Root and Lost Files, for each detected file system.
FILE SYSTEM STRUCTURE
5 Root
0 $MFT
1 $MFTMirr
2 $LogFile
3 $Volume
4 $AttrDef
6 $Bitmap
7 $Boot
8 $BadClus
8:$Bad $BadClus:$Bad
9:$SDS $Secure:$SDS
9 $Secure
10 $UpCase
11 $Extend
25 $ObjId
24 $Quota
26 $Reparse
66 bbb.txt
64 interesting
65 aaa.txt
−1 LostFiles
67 Dir_67
68 another
N T F S
Interesting artifacts:
• B O O T S E C TO R S → partition geometry
• M F T E N T R I E S → identifier, name, timestamps of files
• I N D E X R E C O R D S → contents of directories
C O R R U P T E D M E TA D ATA (E X A M P L E )
Hard drive
New file system
Old file system
Boot sector
MFT MFT mirror
Backup boot sector
Result
F I L E S Y S T E M R E C O N S T R U C T I O N
A L G O R I T H M
D I S K S C A N N I N G
• The disk is S C A N N E D for artifacts (metadata carving)
• File records are C L U S T E R E D in partitions
• For N T F S: p = y − sx where s = 2
Hard drive
Sector y
Entry number x
3014 3016 3018 3020 3022 3024 3026 3028
29 30 31 32 33 102 103 104
Value of p2956 2956 2956 2956 2956 2820 2820 2820
D I R E C TO RY T R E E R E C O N S T R U C T I O N
Each node is linked to its parent (bottom-
up reconstruction).
When the parent is not available, a ghost
entry is created under Lost Files.
linked to parent
↑
PA RT I T I O N G E O M E T RY
Needed for extracting file contents and accessing external at-
tributes in N T F S (including index records).
Parameters:
• S P C (Sectors per Cluster)
• C B (Cluster Base) → where the file system starts
I N F E R E N C E O F PA RT I T I O N G E O M E T RY
Procedure:
1. Fingerprinting index records
2. Generation of text (from disk)
3. Generation of patterns (from partitions and S P C
enumeration)
4. Matching
T E X T G E N E R AT I O N (E X A M P L E )
The following index records are found on disk:
S E C TO R O W N E R I D
54 14
62 23
78 14
The resulting T E X T is:
. . . ∅ ∅ 14 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14
54 62 78
PAT T E R N G E N E R AT I O N (E X A M P L E )
Given the file records:
M F T E N T RY
P O I N T E R S TO
R E C O R D S (R U N L I S T )
14 11, 17
23 13
S P C = 1 → ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14 ∅ 23 ∅ ∅ ∅ 14
0 11 13 17
S P C = 2 → . . . ∅ ∅ 14 ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14
22 26 34
A P P R O X I M AT E S T R I N G M AT C H I N G
Each pattern is matched against the text.
The best match provides both the C B and S P C parameters.
We use an optimized version of the Baeza-Yates–Perleberg algo-
rithm for approximate string matching.
S O F T WA R E I M P L E M E N TAT I O N
R E C U P E R A B I T
RecuperaBit is the software implementation of our
reconstruction algorithm:
• Modular program written in Python
• Full implementation for N T F S reconstruction
• Extensible by adding additional plug-ins
E X P E R I M E N T S
Test results:
• RecuperaBit was tested against 9 existing programs
• 4 different hard drive images were considered
• The final test involves increasing damage on one drive
F I L E S Y S T E M D E T E C T I O N
S O F T WA R E #1 #2 #3 #4
Gpart OK OK Nothing Partial
TestDisk OK OK Nothing OK (+1)
Autopsy OK Partial Nothing OK
Scrounge-NTFS OK OK Nothing OK
Restorer Ultimate OK OK OK OK
DMDE OK OK OK OK (+3)
Recover It All Now OK Nothing Nothing OK
GetDataBack OK OK Nothing OK (+1)
SalvageRecovery OK OK Nothing OK (+1)
RecuperaBit OK OK OK OK × 2 (+302)
D I R E C TO RY T R E E A C C U R A C Y
S O F T WA R E #1 #2 #3 #4
TestDisk Perfect Error — Error
Autopsy Perfect No files — Good
Scrounge-NTFS Partial Terrible Terrible Terrible
Restorer Ultimate Perfect Partial Perfect Good
DMDE Perfect Error Perfect Good
Recover It All Now Terrible — — No files
GetDataBack Perfect Good — Good
SalvageRecovery Perfect Terrible — Perfect
RecuperaBit Perfect Perfect Perfect Perfect
R E C O V E R E D F I L E C O N T E N T S
S O F T WA R E S PA R S E C O M P R E S S E D E N C RY P T E D
TestDisk OK OK Empty
Autopsy Empty OK OK
Scrounge-NTFS OK Unsupported OK
Restorer Ultimate OK OK OK
DMDE OK OK Unsupported
Recover It All Now OK Wrong OK
GetDataBack Empty OK OK
SalvageRecovery Empty Wrong OK
RecuperaBit OK Unsupported OK
O U T P U T Q U A L I T Y V S C O R R U P T I O N L E V E L
0% 20% 40% 60% 80% 100%
Damaged sectors
0
5000
10000
15000
19399
Numberoffiles
All detected files
Unreachable from Root
C O N C L U S I O N
Contributions:
• Generic bottom-up reconstruction algorithm
• Strategy for partition geometry detection (N T F S)
Results:
• Successful reconstruction in all tested cases
• Sometimes better than commercial programs

RecuperaBit: Forensic File System Reconstruction Given Partially Corrupted Metadata

  • 1.
    RecuperaBit: Forensic FileSystem Reconstruction Given Partially Corrupted Metadata C A N D I D AT E Andrea Lazzarotto S U P E RV I S O R Ch. Prof. Riccardo Focardi — Ca’ Foscari Dorsoduro 3246 30123 Venezia Università Ca’Foscari Venezia
  • 2.
    F I LE S Y S T E M R E C O N S T R U C T I O N Work: • File system reconstruction from damaged metadata • Detection of partition geometry • Tests against similar software Motivation: • File system analysis is used in many investigations • Carving does not provide context • File systems may be damaged
  • 3.
    C O NT E N T S 1. F O R E N S I C F I L E S Y S T E M A N A LY S I S Problem definition and N T F S features 2. F I L E S Y S T E M R E C O N S T R U C T I O N A L G O R I T H M Tree reconstruction and partition detection 3. S O F T WA R E I M P L E M E N TAT I O N Test results
  • 4.
    F O RE N S I C F I L E S Y S T E M A N A LY S I S
  • 5.
    P R OB L E M D E F I N I T I O N Problem (Forensic File System Reconstruction). Develop an algo- rithm that reconstructs the directory structure of one or more types of file systems. I N P U T 1. Bitstream copy of drive 2. File system types to search O U T P U T Files divided in Root and Lost Files, for each detected file system.
  • 6.
    FILE SYSTEM STRUCTURE 5Root 0 $MFT 1 $MFTMirr 2 $LogFile 3 $Volume 4 $AttrDef 6 $Bitmap 7 $Boot 8 $BadClus 8:$Bad $BadClus:$Bad 9:$SDS $Secure:$SDS 9 $Secure 10 $UpCase 11 $Extend 25 $ObjId 24 $Quota 26 $Reparse 66 bbb.txt 64 interesting 65 aaa.txt −1 LostFiles 67 Dir_67 68 another
  • 7.
    N T FS Interesting artifacts: • B O O T S E C TO R S → partition geometry • M F T E N T R I E S → identifier, name, timestamps of files • I N D E X R E C O R D S → contents of directories
  • 8.
    C O RR U P T E D M E TA D ATA (E X A M P L E ) Hard drive New file system Old file system Boot sector MFT MFT mirror Backup boot sector Result
  • 9.
    F I LE S Y S T E M R E C O N S T R U C T I O N A L G O R I T H M
  • 10.
    D I SK S C A N N I N G • The disk is S C A N N E D for artifacts (metadata carving) • File records are C L U S T E R E D in partitions • For N T F S: p = y − sx where s = 2 Hard drive Sector y Entry number x 3014 3016 3018 3020 3022 3024 3026 3028 29 30 31 32 33 102 103 104 Value of p2956 2956 2956 2956 2956 2820 2820 2820
  • 11.
    D I RE C TO RY T R E E R E C O N S T R U C T I O N Each node is linked to its parent (bottom- up reconstruction). When the parent is not available, a ghost entry is created under Lost Files. linked to parent ↑
  • 12.
    PA RT IT I O N G E O M E T RY Needed for extracting file contents and accessing external at- tributes in N T F S (including index records). Parameters: • S P C (Sectors per Cluster) • C B (Cluster Base) → where the file system starts
  • 13.
    I N FE R E N C E O F PA RT I T I O N G E O M E T RY Procedure: 1. Fingerprinting index records 2. Generation of text (from disk) 3. Generation of patterns (from partitions and S P C enumeration) 4. Matching
  • 14.
    T E XT G E N E R AT I O N (E X A M P L E ) The following index records are found on disk: S E C TO R O W N E R I D 54 14 62 23 78 14 The resulting T E X T is: . . . ∅ ∅ 14 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14 54 62 78
  • 15.
    PAT T ER N G E N E R AT I O N (E X A M P L E ) Given the file records: M F T E N T RY P O I N T E R S TO R E C O R D S (R U N L I S T ) 14 11, 17 23 13 S P C = 1 → ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14 ∅ 23 ∅ ∅ ∅ 14 0 11 13 17 S P C = 2 → . . . ∅ ∅ 14 ∅ ∅ ∅ 23 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 14 22 26 34
  • 16.
    A P PR O X I M AT E S T R I N G M AT C H I N G Each pattern is matched against the text. The best match provides both the C B and S P C parameters. We use an optimized version of the Baeza-Yates–Perleberg algo- rithm for approximate string matching.
  • 17.
    S O FT WA R E I M P L E M E N TAT I O N
  • 18.
    R E CU P E R A B I T RecuperaBit is the software implementation of our reconstruction algorithm: • Modular program written in Python • Full implementation for N T F S reconstruction • Extensible by adding additional plug-ins
  • 19.
    E X PE R I M E N T S Test results: • RecuperaBit was tested against 9 existing programs • 4 different hard drive images were considered • The final test involves increasing damage on one drive
  • 20.
    F I LE S Y S T E M D E T E C T I O N S O F T WA R E #1 #2 #3 #4 Gpart OK OK Nothing Partial TestDisk OK OK Nothing OK (+1) Autopsy OK Partial Nothing OK Scrounge-NTFS OK OK Nothing OK Restorer Ultimate OK OK OK OK DMDE OK OK OK OK (+3) Recover It All Now OK Nothing Nothing OK GetDataBack OK OK Nothing OK (+1) SalvageRecovery OK OK Nothing OK (+1) RecuperaBit OK OK OK OK × 2 (+302)
  • 21.
    D I RE C TO RY T R E E A C C U R A C Y S O F T WA R E #1 #2 #3 #4 TestDisk Perfect Error — Error Autopsy Perfect No files — Good Scrounge-NTFS Partial Terrible Terrible Terrible Restorer Ultimate Perfect Partial Perfect Good DMDE Perfect Error Perfect Good Recover It All Now Terrible — — No files GetDataBack Perfect Good — Good SalvageRecovery Perfect Terrible — Perfect RecuperaBit Perfect Perfect Perfect Perfect
  • 22.
    R E CO V E R E D F I L E C O N T E N T S S O F T WA R E S PA R S E C O M P R E S S E D E N C RY P T E D TestDisk OK OK Empty Autopsy Empty OK OK Scrounge-NTFS OK Unsupported OK Restorer Ultimate OK OK OK DMDE OK OK Unsupported Recover It All Now OK Wrong OK GetDataBack Empty OK OK SalvageRecovery Empty Wrong OK RecuperaBit OK Unsupported OK
  • 23.
    O U TP U T Q U A L I T Y V S C O R R U P T I O N L E V E L 0% 20% 40% 60% 80% 100% Damaged sectors 0 5000 10000 15000 19399 Numberoffiles All detected files Unreachable from Root
  • 24.
    C O NC L U S I O N Contributions: • Generic bottom-up reconstruction algorithm • Strategy for partition geometry detection (N T F S) Results: • Successful reconstruction in all tested cases • Sometimes better than commercial programs