Watching the DetectivesUsing digital forensic techniques to  investigate the digital persona                    Gareth Kni...
Overview• Introduction to digital forensics  • How is it used in law enforcement  • How can it be used for research and di...
Origin of Digital Forensics• Emerged in 1980s as a response to increasing use of  electronic devices for criminal activity...
Intelligence gathering in law-                   enforcement•Role in legal Disclosure(UK)/e-discovery (US) to obtaindata d...
Value for digital archiving and                 researchIncreasing amount of digital                                      ...
Digital Forensics workflowForensic activities, as described by Digital Forensics Research Workshop (2001)Preservation     ...
Data Acquisition       Act of obtaining possession of digital data for subsequent analysis.        Commonly achieved throu...
Forensic Utility Belt(1) Capture software                         (2) Write Blocker            Stored on bootable       Pr...
Key Questions to be addressed            1.       What type of media do you want to                     capture?          ...
Data Analysis     Content held on digital media serves many purposes:        •   Operating system files, e.g. Windows has ...
Locating active filesCommon techniques for locating user content:•  Navigate directory structure to get a ‘feel’ for data ...
Case Management Tools          Common interface for analysing drive          without content change          Commercial: F...
Identifying user data using          checksums• Checksum algorithm applied to a file  generates a distinct (possibly uniqu...
Hash filtering / Exclusion Hashing• Technique to identify data files obtained from  different sources  • Calculate checksu...
Hash datasets – Information                SourcesNIST National Software Reference Library (NSRL):    • Checksums of legit...
Practical Example60GB hard disk 9,698 known files, 12,974 unknown filesWindows 2000 files that match the NSRL   Unknown fi...
Recovering deleted data• Data files continue to exist in full or in part for some  time after deletion   • The list of dis...
(Data/File) Carving “File carving is the process ofrecovering computer files from astorage medium without the use ofthe st...
Carving Techniques• Block-based carving• Header/Footer Carving• Header/Maximum (file) size Carving• Header/Embedded length...
Header/Footer CarvingAnalyse file to identify data sequences that match a known filetype header & footer          Header  ...
Other carving methods•   Header/Maximum (file size) Carving: Match header of known    file type and extract data in sequen...
Data Carving tool capabilities          A disk containing 20 deleted files - 5 100k text files, 5 5Mb JPEGs, 5          90...
Real world Experience         Laptop containing 60GB hard disk in use for 6-7 years•Able to extract 363 legitimate files,b...
Timeline visualisation          Chronological list of activities performed          on the host machine          Uses:    ...
Text Mining                                                   Java characterisation tool (AQUA)                           ...
Conclusion (1): Challenges for use of         digital forensics in research     Expertise of the researcher       •   Some...
Conclusion (2):             Current/Future challenges     Multi-user systems       • Distinguishing between data created b...
ReferencesDigital Forensics and Born-Digital Content in Cultural Heritage   Collections (2010)http://www.clir.org/pubs/abs...
Upcoming SlideShare
Loading in …5
×

Watching the Detectives: Using digital forensics techniques to investigate the digital persona

3,101 views
2,928 views

Published on

Presentation on digital forensic techniques gave at King's College London on November 8th, 2011.

Published in: Technology
1 Comment
6 Likes
Statistics
Notes
  • Cover image taken from http://www.flickr.com/photos/mdales/5641401670/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,101
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

Watching the Detectives: Using digital forensics techniques to investigate the digital persona

  1. 1. Watching the DetectivesUsing digital forensic techniques to investigate the digital persona Gareth Knight Centre for e-Research,Anatomy Museum, King’s College London, 8th November 2011
  2. 2. Overview• Introduction to digital forensics • How is it used in law enforcement • How can it be used for research and digital curation• Forensic practices • Media imaging • Hash filtering • Data carving• Current/future challenges
  3. 3. Origin of Digital Forensics• Emerged in 1980s as a response to increasing use of electronic devices for criminal activity.• Practioner-led approach - a set of methods applied to gather, retrieve and analyse potential evidence held on digital devices• Emphasis upon “scientifically derived and proven methods” to obtain, analyse & report upon digital evidence (Digital Forensics Research Workshop, 2001)• Legal acceptability influenced by Daubert Standard: • methods must be tested, • Subject to peer review and publication, • Possess a known error rate, • Subject to standards governing their application
  4. 4. Intelligence gathering in law- enforcement•Role in legal Disclosure(UK)/e-discovery (US) to obtaindata designated as evidence inlegal investigation. Robert Clark’s target-centric approach•Broad intelligence gatheringactivities – develop & testhypothesis•Several intelligence cyclesdeveloped to modelinvestigation process Peter Pirolli and Stuart Card sense making loop
  5. 5. Value for digital archiving and researchIncreasing amount of digital Salman Rushdie Archive information:Analysis of research activities • When did an author create a notable work? • What tools did they use? • What sources did they consult? • Is there evidence of material they abandoned?Business function Staff have their machine appraised Emulation of several Apple Macs owned by the author prior to leaving institution/finishing a project to identify data of long-term http://www.emory.edu/home/academic s/libraries/salman-rushdie.html value not held elsewhere
  6. 6. Digital Forensics workflowForensic activities, as described by Digital Forensics Research Workshop (2001)Preservation Collection Validation Identification Analysis Interpretation Documentation Presentation Acquisition Analysis Reporting
  7. 7. Data Acquisition Act of obtaining possession of digital data for subsequent analysis. Commonly achieved through creation a disk image or clone that provides a bit copy of disk. 1 or more 60GB files thathard disk add up to 60GB Motivation for creating a disk image in forensic environment: 1. Backup copy avoids risk of media failure or other damage during use 2. Avoids risk of making inadvertent, unrecoverable change to the primary copy • Files can be created/modified/deleted through access to disk 1. Enable analysis using methods and tools that are not possible/available in the original environment (e.g. emulation, text mining)
  8. 8. Forensic Utility Belt(1) Capture software (2) Write Blocker Stored on bootable Prevents OS media (floppy, CD, writing to USB) connected devices E.g. USB plug- Examples: Dc3dd, through unit DDRescue, OSFClone, FTK Imager(3) Access Devices (4) Destination Media Drive enclosure allows use of internal Digital media on disks via USB which the disk image will be Kryoflux USB disk written, e.g. USB controller allows low hard disk level disk access
  9. 9. Key Questions to be addressed 1. What type of media do you want to capture? • Floppy disk, hard disk, optical media 1. How can the data be accessed? • Hard disk installed within users’ computer • Accessed using appropriate reader (USB hard disk caddy, floppy disk reader, CD/DVD reader) • Network connected disk 1. Where will the acquired image be stored? • External USB disk, • Network device over Ethernet/Serial, etc. 1. What software should you use toDifferent Hardware capture the disk image? Different Media
  10. 10. Data Analysis Content held on digital media serves many purposes: • Operating system files, e.g. Windows has 30,000+ after fresh install • Software: Applications, utilities, games, etc. • Log data: Windows Registry, browser cache, cookies, temp files • User-generated content: Documents, images, sound, emails, etc. Different data layers available: 1. Active data: Information readily available as normally seen by an OS 2. Inactive/residual data: Information that has been deleted or modified • Deleted files located in unallocated space that have yet to be overwritten (retrieved using undelete application) • Data fragments that contains information from a partially deleted file (retrieved through carving) Inactive data useful, but need to consider ethical issues10
  11. 11. Locating active filesCommon techniques for locating user content:• Navigate directory structure to get a ‘feel’ for data files held on disk• Search by: • File name, e.g. *report* • File type, e.g. *.doc, *.pdf, etc. • Creation/modification date • Content type, e.g. word usage • File size• Additional parameters configurableWindows search easy to perform, but does not identify everything – investigation process can leave artefacts, e.g. thumbs.db behind
  12. 12. Case Management Tools Common interface for analysing drive without content change Commercial: FTK, OSForensic OSS: Sleuthkit/Autopsy, Digital Forensics Framework, PyFlag Provide tools to sort/visualise data by: • Name, • Folder, • Size, • Type, • Creation/Modification date • Hash set
  13. 13. Identifying user data using checksums• Checksum algorithm applied to a file generates a distinct (possibly unique) alphanumeric value• Many different types of checksum algorithm• Commonly used to check for accidental/deliberate data change/corruption • Generate checksum on October 1st • Generate checksum on October 14th & compare to Oct 1st value – are they the same?
  14. 14. Hash filtering / Exclusion Hashing• Technique to identify data files obtained from different sources • Calculate checksum (e.g. MD5, SHA-1) of one or more files • Compare each checksum against a checksum database indicating files known to originate from a third partyChecksum types • known good’ - Files that perform a legitimate purpose, e.g. Operating System, application. • ‘known bad’ - Files that denote viruses, Trojans, crackers tools, or other malicious files • Unknown – Files that have not been previously encountered.
  15. 15. Hash datasets – Information SourcesNIST National Software Reference Library (NSRL): • Checksums of legitimate files generated from software products obtained through purchase/donation. • Stores 10,000+ software files. • Reference Data Set published every 3 months & available through 3rd parties, such as Find-a-HashHashKeeper - National Drug Intelligence Center • Checksums gathered through criminal investigation. • Academic (and other) institutions must file a FoI request to gain access to software and database.Online File Signature Database (OFSDB): • Subscription based system dependent upon user contribution. • Full access available through subscription of 25 USD per year• Currently being used by curators/archivists to distinguish between known third-party and potential user created files.
  16. 16. Practical Example60GB hard disk 9,698 known files, 12,974 unknown filesWindows 2000 files that match the NSRL Unknown files that may be user created database content Method may be combined with other techniques, e.g. path and filename analysis to exclude other common files (e.g. thumbs.db)
  17. 17. Recovering deleted data• Data files continue to exist in full or in part for some time after deletion • The list of disk clusters occupied by the file is relabelled as ‘unallocated’, i.e. available for use.Recovering complete files• Files may be recovered if the space has not been allocated to new data – Recovery soft may be used to recreate pointer to files that exist • Likelihood of retrieving entire file decreases over time
  18. 18. (Data/File) Carving “File carving is the process ofrecovering computer files from astorage medium without the use ofthe standard file-system metadatathat is typically used during a normalfile retrieval.”http://www.techheadsitconsulting.com/f/file-carving.htmlUseful for data recovery when: • The File system ‘pointer’ (directory entry) to the file has been deleted or corrupted. • Sectors allocated to data file have been partially overwritten
  19. 19. Carving Techniques• Block-based carving• Header/Footer Carving• Header/Maximum (file) size Carving• Header/Embedded length Carving• Statistical Carving• Semantic Carving• Fragment Recovery Carving• Repackaging Carving• Fuzzy hashing Carvinghttp://www.forensicswiki.org/wiki/File_Carving
  20. 20. Header/Footer CarvingAnalyse file to identify data sequences that match a known filetype header & footer Header Footer GIF nx47nx49nx46nx38nx37nx61 nx00nx3b JPG nxffnxd8nxffnxe0nx00nx10 nxffnxd9 ZIP PKnx03nx04 nx3cnxacSample header/information used by Scapel to identify files
  21. 21. Other carving methods• Header/Maximum (file size) Carving: Match header of known file type and extract data in sequence until a specified file size (e.g. 10MB) has been reached.• Header/Embedded Length carving: Technique for carving formats that store total size(length) in header, e.g. BMP, PDF, AVI• File structure based/Deep carving: Use documentation on file type structure to carve files• Smart Carving: Use documentation on file system’s data handling to address disk fragmentation issues
  22. 22. Data Carving tool capabilities A disk containing 20 deleted files - 5 100k text files, 5 5Mb JPEGs, 5 90MB WMV videos and 5 300MB AVI videos (approx file size) is imaged and stored as RAW /DD 1. PhotoRec recovered all texts and JPGs. 3 AVIs were recovered in entirety, 2 were incomplete (but partially playable). 2. Scalpel – Recovered all JPGs and 3 incomplete (but partially playable) AVIs. Did not extract WMV or txt 3. MagicRescue – Only recovers files it has a ‘recipe’ for (JPG, AVI, but not txt or WMV) – recovered JPGs, but not AVI. Did not attempt other formats. 4. Foremost - unable to recover any files Planned Carver 2.0 may provide intelligent carving http://www.forensicswiki.org/wiki/Carver_2.0_Planning_Pag22
  23. 23. Real world Experience Laptop containing 60GB hard disk in use for 6-7 years•Able to extract 363 legitimate files,but…. • Disk fragmentation a big problem! • Data carving can take a loooonnng time – potentially weeks or months to perform in full • Software instability • Data carving requires a lot of disk space to store extracted data files • Large number of false positives (fake files) produced • Filestreams (e.g. images within container) often extracted, but not Examples of Incomplete & invalid data files larger file (PowerPoint)
  24. 24. Timeline visualisation Chronological list of activities performed on the host machine Uses: • Gain understanding of research activities on machine • Investigate a specific incident •Traditionally concerned with File creation/accessed/modification •SuperTimeline tools being developed that merge time data from multiple sources. • OSS Timescanner useful for generating log of events
  25. 25. Text Mining Java characterisation tool (AQUA) •Uses Apache Tika to obtain information about a file collection and its textual content •Relative path, file name, size, modified date, SHA-256 digest, MIME type, •Word frequency of the generated Lucene index Stanford MUSE Java tool Mailbox analysis •Relationships - Grouping of contacts •Name lists (people, places, organizations •Sentiment analysis using word lists – map over timeAQUA http://wiki.opf-labs.org/display/AQuA/Characterising+Externally+Generated+ContentStanford Muse http://vis.stanford.edu/papers/muse
  26. 26. Conclusion (1): Challenges for use of digital forensics in research Expertise of the researcher • Some technical expertise req. to perform acquisition and analysis Ethics of a forensic investigation • User may not realise that deleted/scraps of content continues to exist - how do you communicate intent to your user community? • Terminology is currently influenced by law enforcement community and is a barrier to wider use – forensics? Suspect? Capabilities of the tools • No single tool is appropriate – require a combination of different ones • Some integration is necessary to simplify process.26
  27. 27. Conclusion (2): Current/Future challenges Multi-user systems • Distinguishing between data created by multiple users on same machine is time-consuming - requires analysis of timestamps and other features. Archiving data on 3rd party services: • Ethical issues associated with accessing & archiving user data on mail servers, second life, and cloud providers etc. Diverse device & media types: • Solid State devices subject to ‘wear levelling’ which purges inactive data (http://www.jdfsl.org/subscriptions/abstracts/abstract-v5n3-bell.htm) • Use of portable (personal/work) devices in the workplace, e.g. iPad, iPhone, Android devices – what is the master copy?27
  28. 28. ReferencesDigital Forensics and Born-Digital Content in Cultural Heritage Collections (2010)http://www.clir.org/pubs/abstract/pub149abst.htmlPerformance Evaluation of Open-Source Disk Imaging Tools for Collecting Digital Evidencehttp://www.kuis.edu.my/ictconf/proceedings/353_integration2010_proceedi ngs.pdfThe Evolution of File Carving (2009)http://digital-assembly.com/technology/research/pubs/ieee-spm-2009.pdfHash Filtering techniqueshttp://computer-forensics.sans.org/blog/2010/02/22/extracting-known-bad- hashset-from-nsrl/Digital Forensic tutorials http://computer-forensics.sans.org/blog/Open Source Forensics http://www2.opensourceforensics.org/Forensics Wiki http://www.forensicswiki.org/wiki/Main_Page

×