SlideShare a Scribd company logo
1 of 40
Download to read offline
fiwalk With Me:
  Building Emergent Pre-Ingest Workflows for Digital
Archival Records using Open Source Forensic Software

      Mark A. Matienzo, Yale University Library
                  Code4lib 2011

mark@matienzo.org http://matienzo.org/ @anarchivist
Disclaimer


The following presentation expresses
 opinions of my own and not of my
   employer, my coworkers, etc.
Digital forensics?


  http://www.flickr.com/photos/freeparking/480863346/
http://www.flickr.com/photos/johnjack/3449280414
Locard’s exchange
     principle
Key Works




urn:isbn:978-0201634976     urn:isbn:978-0321268174   urn:isbn:978-1932326376
http://www.flickr.com/photos/bjornmeansbear/4662232392/
Design Principles
•Use digital forensics software and methodology to
  support accessioning of born-digital archival records

•Mitigate risk of media deterioration and obsolescence
•Prefer open source solutions whenever possible
•Integrate into a larger, but yet-to-be-defined workflow
•Use curation micro-services a guiding philosophy for
  implementation and further analysis
Applied Methodology
•Use Carrier’s (2005) model of the digital investigation
  process: Preservation    Searching    Reconstruction


•Volume and file system as main areas for analysis
•Assume much of the state is already lost
•Methods should approach or intend forensic soundness
•Ethical issues (as raised in CLIR report) are out of scope
Mitigate Risk




http://www.flickr.com/photos/moparx/4013824025/
A Larger Workflow
Phys. format inventory                   Detailed inventory                                 Create desc metadata                       Enforce restrictions
Document rcrdkeeping                     Disk imaging                                       Intellectual/phys. arr.                    Add supplemental
  systems                                File format survey                                 Lump/split dig.objects                        desc./annotations
Draft submission agmt                    Assess access reqmts                               Declare rights/access                      Use viewers as needed
Gather contextual info                   Complete submission                                   restrictions                            Log use/access
                                          agreement w/ source                               Develop access tools
                                         Assess preserv. needs                              Migrate selected recs




    Collection               Selection                                    Selection          Arrangement &            Selection
                                            Accessioning                                                                                      Access
   development                                                                                 description




                                                                    Preservation


                                                                 Migrate objects/files
                                                                 JHOVE/DROID reports
                                                                 Checksum verification
                                                                 Emulation/virt. env.




                   Physical &
                 network media

                                                                                        Preservation                   Dissemination
                                              Working storage
                                                                                          storage                         storage
Open Source Forensics
•Digital forensics is a high-stakes market
•Proprietary forensics software is not easily extensible
•Proprietary forensics software is often platform-specific
•Cultural heritage institutions are still an emerging market
  for digital forensics

•Our needs are different and still being defined
Microservices as
  Philosophy


  http://www.flickr.com/photos/gregmote/2797330534
Principles                      Preferences                            Practices
• Granularity                • Small and simple over              • Define, decompose,
                                  large and complex                   recurse


• Orthogonality              • Minimally sufficient over • Top down design, bottom
                                  feature-laden                       up implementation


• Parsimony                  • Configurable over the               • Code to interfaces
                                  prescribed


• Evolution                  • The proven over the                • Sufficiency through a
                                  merely novel                        series of incrementally
                                                                      necessary steps
                             • Outcomes over means

               UC Curation Center/California Digital Library, “UC3 Curation Foundations”
      https://confluence.ucop.edu/download/attachments/13860983/UC3-Foundations-latest.pdf
Accessioning Workflow
   Start
accessioning                              Write-protect media               Verify image
  process            Media




                                          Record identifying              Extract filesystem-     Disk     Meta-
                 Retrieve media            characteristics of               and file-level      images     data

                                          media as metadata                   metadata         Transfer package




                                                                           Package images
               Assign identifiers to                                                             Ingest transfer
                                            Create image                  and metadata for
                     media                                                                         package
                                                                               ingest




                                  Media                         FS/File                           Document
                                                Disk
                                   MD                            MD                              accessioning
                                               image
                                                                                                   process




                                                                                                     End
                                                                                                 accessioning
                                                                                                   process
Implementation


 http://www.flickr.com/photos/generated/2084287794/
Disk Imaging
•dd: creates raw images
 •Related implementations: dc3dd, dcfldd, dd_rescue
 •Fast, but no mechanism to store imaging metadata
•Advanced Forensic Format (AFF)/AFFLib
 •Cross platform, reasonably fast
 •Can store arbitrary metadata
 •Plenty of GUI-based imaging tools
AFF File Structure




Garfinkel et al. 2006 (http://nrs.harvard.edu/urn-3:HUL.InstRepos:2829932), p. 27
Accessioning Workflow
   Start
accessioning                                     Write-protect media                 Verify image
                                                                               F
  process            Media                                                  AF




                                                 Record identifying                Extract filesystem-     Disk     Meta-
                 Retrieve media                   characteristics of                 and file-level      images     data

                                             F   media as metadata                     metadata         Transfer package
                                          AF




                                                                                    Package images
               Assign identifiers to                                                                      Ingest transfer
                                                   Create image                    and metadata for
                     media                                                                                  package
                                                                                        ingest
                                             F
                                          AF




                                  Media                                FS/File                             Document
                                                       Disk
                                   MD                                   MD                                accessioning
                                                      image
                                                                                                            process




                                                                                                              End
                                                                                                          accessioning
                                                                                                            process
The Sleuth Kit
•Open source C library, command line tools, and browser-
  based Perl application (Autopsy) for forensic analysis

•Supports analysis of NTFS, FAT, HFS+, Ext2/3, UFS1/2
•Splits tools into layers: volume system, file system, file
  name, metadata, data unit (“block”)

•Additional utilities to sort and post-process extracted
  metadata
Extracting Metadata & Files
$ fls -m A: -a -f fat 2004-M-008.0018.aff
0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114
0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226
0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ fls -m A: -a -f fat 2004-M-008.0018.aff | mactime
Wed Dec 31 1969 19:00:00   202152 ..c. r/rrwxrwxrwx   0      0        3        A:/_ublist1.wpd (deleted)
                           202119 ..c. r/rrwxrwxrwx   0      0        4        A:/publist.wpd
                           205607 ..c. r/rrwxrwxrwx   0      0        7        A:/publistwkg.wpd
Thu Feb 22 2001 00:00:00   202152 .a.. r/rrwxrwxrwx   0      0        3        A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:30:52   202152 m... r/rrwxrwxrwx   0      0        3        A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:31:54   202152 ...b r/rrwxrwxrwx   0      0        3        A:/_ublist1.wpd (deleted)
Fri Feb 23 2001 16:17:18   205607 m... r/rrwxrwxrwx   0      0        7        A:/publistwkg.wpd
Fri Feb 23 2001 16:17:34   202119 m... r/rrwxrwxrwx   0      0        4        A:/publist.wpd
Fri Feb 23 2001 16:20:26   202119 ...b r/rrwxrwxrwx   0      0        4        A:/publist.wpd
Fri Feb 23 2001 16:20:37   205607 ...b r/rrwxrwxrwx   0      0        7        A:/publistwkg.wpd
Tue Sep 21 2010 00:00:00   202119 .a.. r/rrwxrwxrwx   0      0        4        A:/publist.wpd
                           205607 .a.. r/rrwxrwxrwx   0      0        7        A:/publistwkg.wpd

$ icat 2004-M-008.0018.aff 4 | fido.sh -
OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN"

$ icat 2004-M-008.018.aff 4 > publist.wpd
Extracting Metadata & Files
$ fls -m A: -a -f fat 2004-M-008.0018.aff
0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114
0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226
0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime
Wed Dec 31 1969 19:00:00   202152 ..c. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
                           202119 ..c. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 ..c. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Thu Feb 22 2001 00:00:00   202152 .a.. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:30:52   202152 m... r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:31:54   202152 ...b r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Fri Feb 23 2001 16:17:18   205607 m... r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Fri Feb 23 2001 16:17:34   202119 m... r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:26   202119 ...b r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:37   205607 ...b r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Tue Sep 21 2010 00:00:00   202119 .a.. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 .a.. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd

$ icat 2004-M-008.0018.aff 4 | fido.sh -
OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN"

$ icat 2004-M-008.018.aff 4 > publist.wpd
Extracting Metadata & Files
$ fls -m A: -a -f fat 2004-M-008.0018.aff
0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114
0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226
0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime
Wed Dec 31 1969 19:00:00   202152 ..c. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
                           202119 ..c. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 ..c. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Thu Feb 22 2001 00:00:00   202152 .a.. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:30:52   202152 m... r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:31:54   202152 ...b r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Fri Feb 23 2001 16:17:18   205607 m... r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Fri Feb 23 2001 16:17:34   202119 m... r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:26   202119 ...b r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:37   205607 ...b r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Tue Sep 21 2010 00:00:00   202119 .a.. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 .a.. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd

$ icat 2004-M-008.0018.aff 4 | fido.sh -
OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN"

$ icat 2004-M-008.018.aff 4 > publist.wpd
Extracting Metadata & Files
$ fls -m A: -a -f fat 2004-M-008.0018.aff
0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114
0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226
0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime
Wed Dec 31 1969 19:00:00   202152 ..c. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
                           202119 ..c. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 ..c. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Thu Feb 22 2001 00:00:00   202152 .a.. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:30:52   202152 m... r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:31:54   202152 ...b r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Fri Feb 23 2001 16:17:18   205607 m... r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Fri Feb 23 2001 16:17:34   202119 m... r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:26   202119 ...b r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:37   205607 ...b r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Tue Sep 21 2010 00:00:00   202119 .a.. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 .a.. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd

$ icat 2004-M-008.0018.aff 4 | fido.sh -
OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN"

$ icat 2004-M-008.018.aff 4 > publist.wpd
Extracting Metadata & Files
$ fls -m A: -a -f fat 2004-M-008.0018.aff
0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114
0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226
0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237
0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0
0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0
0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0
0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0

$ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime
Wed Dec 31 1969 19:00:00   202152 ..c. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
                           202119 ..c. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 ..c. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Thu Feb 22 2001 00:00:00   202152 .a.. r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:30:52   202152 m... r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Thu Feb 22 2001 17:31:54   202152 ...b r/rrwxrwxrwx 0        0        3         A:/_ublist1.wpd (deleted)
Fri Feb 23 2001 16:17:18   205607 m... r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Fri Feb 23 2001 16:17:34   202119 m... r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:26   202119 ...b r/rrwxrwxrwx 0        0        4         A:/publist.wpd
Fri Feb 23 2001 16:20:37   205607 ...b r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd
Tue Sep 21 2010 00:00:00   202119 .a.. r/rrwxrwxrwx 0        0        4         A:/publist.wpd
                           205607 .a.. r/rrwxrwxrwx 0        0        7         A:/publistwkg.wpd

$ icat 2004-M-008.0018.aff 4 | fido.sh -
OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN"

$ icat 2004-M-008.018.aff 4 > publist.wpd
Accessioning Workflow
   Start
accessioning                              Write-protect media                 Verify image
                                                                        K
  process            Media                                           TS




                                          Record identifying                Extract filesystem-     Disk     Meta-
                 Retrieve media            characteristics of                 and file-level      images     data

                                          media as metadata             K       metadata         Transfer package
                                                                     TS




                                                                             Package images
               Assign identifiers to                                                               Ingest transfer
                                            Create image                    and metadata for
                     media                                                                           package
                                                                                 ingest




                                  Media                         FS/File                             Document
                                                Disk
                                   MD                            MD                                accessioning
                                               image
                                                                                                     process




                                                                                                       End
                                                                                                   accessioning
                                                                                                     process
fiwalk
•C++ program with Python module for processing images
•Outputs results in plain text key/value pairs, XML, CSV,
  or ARFF (for Weka data mining software)

•Developed to support automated forensic processing by
  breaking it into three steps: extract, represent, process

•Pluggable file-level metadata extraction (expects key/
  value pairs)

•Makes development easy and fast
Sample fiwalk Output
<?xml version='1.0' encoding='UTF-8'?>
<fiwalk xmloutputversion='0.3'>
  <metadata> <!-- metadata about the disk image -->
  <creator> <!-- fiwalk provenance metadata (runtime environment, etc.) --></creator>
  <source>
    <image_filename>2004-M-008.dd-0018.001</image_filename>
  </source>
<!-- fs start: 0 -->
  <volume offset='0'>
    <!-- volume metadata -->
    <fileobject>
      <filename>_ublist1.wpd</filename>
      <!-- more metadata about specific files within the image -->
    </fileobject>
    <fileobject/><!-- one for each file -->
  </volume>
  <runstats>
    <!-- runtime statistics -->
  </runstats>
</fiwalk>
Sample fiwalk Output
<fileobject>
  <filename>_ublist1.wpd</filename>
  <partition>1</partition>
  <id>1</id>
  <name_type>r</name_type>
  <filesize>202152</filesize>
  <unalloc>1</unalloc>
  <used>1</used>
  <inode>3</inode>
  <meta_type>1</meta_type>
  <mode>511</mode>
  <nlink>0</nlink>
  <uid>0</uid>
  <gid>0</gid>
  <mtime>982881052</mtime>
  <atime>982818000</atime>
  <crtime>982881114</crtime>
  <libmagic>(Corel/WP)</libmagic>
  <byte_runs>
   <run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/>
  </byte_runs>
  <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest>
  <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest>
</fileobject>
Accessioning Workflow
   Start
accessioning                              Write-protect media                  Verify image
  process            Media




                                          Record identifying                 Extract filesystem-     Disk     Meta-
                 Retrieve media            characteristics of                  and file-level      images     data

                                          media as metadata           al
                                                                         k       metadata         Transfer package
                                                                    fiw




                                                                              Package images
               Assign identifiers to                                                                Ingest transfer
                                            Create image                     and metadata for
                     media                                                                            package
                                                                                  ingest




                                  Media                         FS/File                              Document
                                                Disk
                                   MD                            MD                                 accessioning
                                               image
                                                                                                      process




                                                                                                        End
                                                                                                    accessioning
                                                                                                      process
Why fiwalk?
•Faster (and more forensically sound) to extract metadata
  once rather than having to keep processing an image

•Develop better assessments during accessioning process
  (directory structure significant? timestamps accurate?)

•fiwalk’s output is something like a METS structMap
•Building non-invasive assessment tools takes less time
http://www.flickr.com/photos/cdrummbks/4574882817/
Gumshoe
•Prototype application
•Blacklight (Ruby on Rails + Solr) & Python indexing code
•Indexing code works with fiwalk output or directly over a
  disk image (using fiwalk’s Python bindings)

•Populates Solr index with all file-level metadata from
  fiwalk and text strings extracted from files

•Code at http://github.com/anarchivist/gumshoe
•Demo at http://xgumshoex.heroku.com/
Future Directions




 http://www.flickr.com/photos/87913776@N00/5129662279/
AFF4
•Emerging format, with tools still under development
•Better for a distributed environment
•Friendlier to micro-services philosophy
•Clearer object and metadata model (RDF-based)
•Container format is Zip64
•Cohen and Schatz 2010 (doi:10.1016/j.diin.2010.05.015)
  show hash based imaging as a more efficient alternative
Sample AFF4 Metadata
                @prefix   D1: <urn:aff4:652e4027-27fab2941>
                @prefix   G1: <urn:aff4:19857a87-a190b2f87>
                @prefix   G2: <urn:aff4:0a1fc78a-927bfacef>
                @prefix   S1: <urn:aff4:652e4027-ffff01199>
                @prefix   I1: <urn:aff4:9003027a-11199ffff>
                @prefix   aff4: <http://afflib.org/2009/aff4/#>
                @prefix   dcterms: <http://purl.org/dc/terms/>

                G1: {
                    D1: aff4:serialNumber "zx322o91"
                    D1: aff4:hash "3897450fa18094b13"^^aff4:md5
                }

                G2: {
                    S1:   aff4:name "aff4imager"
                    S1:   aff4:vendor <http://aff.org/>
                    S1:   aff4:asserts G1:
                    S1:   aff4:type aff4:AcquisitionTool.
                    S1:   aff4:version "0.2"
                    I1:   aff4:type aff4:Image.
                    I1:   aff4:hash "3897450fa18094b13"^^aff4:md5
                    I1:   dcterms:creator S1:
                }

 Adapted from Schatz and Cohen 2010 (doi:10.1007/978-3-642-15506-2_16), pp. 234-236
Accessioning Workflow
   Start
accessioning                                                    Write-protect media                    Verify image
                                                                                             F   4
  process                       Media                                                     AF




                                                                Record identifying                   Extract filesystem-     Disk     Meta-
                            Retrieve media                       characteristics of                    and file-level      images     data

                                                                media as metadata                        metadata         Transfer package
                                                        F   4
                                                     AF




                                                                                                      Package images
                          Assign identifiers to                                                                             Ingest transfer
                                                                  Create image                       and metadata for
                                media                                                                                         package
                                                                                                          ingest
                  F   4                                 F   4                                F   4
               AF                                    AF                                   AF




                                             Media                                    FS/File                                Document
                                                                      Disk
                                              MD                                       MD                                   accessioning
                                                                     image
                                                                                                                              process




                                                                                                                                End
                                                                                                                            accessioning
                                                                                                                              process
Further Toolsets


  http://www.flickr.com/photos/oskay/5369749968/
Thank You

                        mark@matienzo.org
                        http://matienzo.org/
                        twitter: @anarchivist




This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

More Related Content

What's hot

Scanning & document management
Scanning & document managementScanning & document management
Scanning & document managementGautam Ganguly
 
Data Loss Prevention de RSA
Data Loss Prevention de RSAData Loss Prevention de RSA
Data Loss Prevention de RSAAEC Networks
 
Advanced modelling made simple with the Gmodel metalanguage
Advanced modelling made simple with the Gmodel metalanguageAdvanced modelling made simple with the Gmodel metalanguage
Advanced modelling made simple with the Gmodel metalanguageJorn Bettin
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony VTTEC
 
Linked data and the LOCAH project ILI2011
Linked data and the LOCAH project ILI2011Linked data and the LOCAH project ILI2011
Linked data and the LOCAH project ILI2011Bethan Ruddock
 
NCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNebraska Library Commission
 
IBM Tivoli Storage Productivit Center overview and update
IBM Tivoli Storage Productivit Center overview and updateIBM Tivoli Storage Productivit Center overview and update
IBM Tivoli Storage Productivit Center overview and updateTony Pearson
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Alex Hardisty
 

What's hot (15)

Scanning & document management
Scanning & document managementScanning & document management
Scanning & document management
 
Hardware
HardwareHardware
Hardware
 
Data Loss Prevention de RSA
Data Loss Prevention de RSAData Loss Prevention de RSA
Data Loss Prevention de RSA
 
Iw2415551560
Iw2415551560Iw2415551560
Iw2415551560
 
Ronnie Oomen (EMC)
Ronnie Oomen (EMC)Ronnie Oomen (EMC)
Ronnie Oomen (EMC)
 
Advanced modelling made simple with the Gmodel metalanguage
Advanced modelling made simple with the Gmodel metalanguageAdvanced modelling made simple with the Gmodel metalanguage
Advanced modelling made simple with the Gmodel metalanguage
 
UML Profile for DDS
UML Profile for DDSUML Profile for DDS
UML Profile for DDS
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony V
 
Linked data and the LOCAH project ILI2011
Linked data and the LOCAH project ILI2011Linked data and the LOCAH project ILI2011
Linked data and the LOCAH project ILI2011
 
NCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and Protection
 
IBM Tivoli Storage Productivit Center overview and update
IBM Tivoli Storage Productivit Center overview and updateIBM Tivoli Storage Productivit Center overview and update
IBM Tivoli Storage Productivit Center overview and update
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3
 
Lec7
Lec7Lec7
Lec7
 
Dig c curr
Dig c currDig c curr
Dig c curr
 
Ch12
Ch12Ch12
Ch12
 

Viewers also liked

Let's Get Small: A Microservices Approach to Library Websites
Let's Get Small: A Microservices Approach to Library WebsitesLet's Get Small: A Microservices Approach to Library Websites
Let's Get Small: A Microservices Approach to Library WebsitesMrDys
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing videoprwheatley
 
Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationMichael Day
 
HNI leeromgeving 05
HNI leeromgeving 05HNI leeromgeving 05
HNI leeromgeving 05fneggers
 
HNI leeromgeving 04
HNI leeromgeving 04HNI leeromgeving 04
HNI leeromgeving 04fneggers
 
Rebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestRebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestdri_ireland
 
The lifecycle of a short story
The lifecycle of a short storyThe lifecycle of a short story
The lifecycle of a short storyLynne Thomas
 
HNI leeromgeving 02
HNI leeromgeving 02HNI leeromgeving 02
HNI leeromgeving 02fneggers
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Lynne Thomas
 
Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...LIBER Europe
 
Mapping the Digital Preservation Wilderness: What you need to know
Mapping the Digital Preservation Wilderness:  What you need to knowMapping the Digital Preservation Wilderness:  What you need to know
Mapping the Digital Preservation Wilderness: What you need to knowJody DeRidder
 
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie HarrowerCNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrowerdri_ireland
 
Character profiles
Character profilesCharacter profiles
Character profilesEllenorHandy
 

Viewers also liked (20)

DiscoveringDH_ManagingDigitalProjects
DiscoveringDH_ManagingDigitalProjectsDiscoveringDH_ManagingDigitalProjects
DiscoveringDH_ManagingDigitalProjects
 
Let's Get Small: A Microservices Approach to Library Websites
Let's Get Small: A Microservices Approach to Library WebsitesLet's Get Small: A Microservices Approach to Library Websites
Let's Get Small: A Microservices Approach to Library Websites
 
[Dpf manager] berlin workshop
[Dpf manager] berlin workshop[Dpf manager] berlin workshop
[Dpf manager] berlin workshop
 
Adding MediaConch to Archivematica for mkv/ffv1 checking
Adding MediaConch to Archivematica for mkv/ffv1 checkingAdding MediaConch to Archivematica for mkv/ffv1 checking
Adding MediaConch to Archivematica for mkv/ffv1 checking
 
cv (2)
cv (2)cv (2)
cv (2)
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
SHAREmodule3
SHAREmodule3SHAREmodule3
SHAREmodule3
 
HNI leeromgeving 05
HNI leeromgeving 05HNI leeromgeving 05
HNI leeromgeving 05
 
HNI leeromgeving 04
HNI leeromgeving 04HNI leeromgeving 04
HNI leeromgeving 04
 
Rebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestRebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingest
 
The lifecycle of a short story
The lifecycle of a short storyThe lifecycle of a short story
The lifecycle of a short story
 
HNI leeromgeving 02
HNI leeromgeving 02HNI leeromgeving 02
HNI leeromgeving 02
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2
 
Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...
 
Mapping the Digital Preservation Wilderness: What you need to know
Mapping the Digital Preservation Wilderness:  What you need to knowMapping the Digital Preservation Wilderness:  What you need to know
Mapping the Digital Preservation Wilderness: What you need to know
 
ENArC- international cooperation, current and past project activities - statu...
ENArC- international cooperation, current and past project activities - statu...ENArC- international cooperation, current and past project activities - statu...
ENArC- international cooperation, current and past project activities - statu...
 
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie HarrowerCNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
 
SHAREmodule2
SHAREmodule2SHAREmodule2
SHAREmodule2
 
Character profiles
Character profilesCharacter profiles
Character profiles
 

Similar to Building Emergent Pre-Ingest Workflows

Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revisedpeterchanws
 
Bit Level Preservation
Bit Level PreservationBit Level Preservation
Bit Level PreservationMicah Altman
 
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementNovell
 
Wewebu customer success story California Dept. of Public Health
Wewebu customer success story California Dept. of Public HealthWewebu customer success story California Dept. of Public Health
Wewebu customer success story California Dept. of Public HealthWeWebU Software AG
 
Tackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaTackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaCourtney Mumma
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyGarethKnight
 
Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...samalanmeister
 
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGYCYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGYjmical
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOpenStorageSummit
 
January 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJanuary 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJohn Wang
 
Open Digital Education Software Kit
Open Digital Education Software KitOpen Digital Education Software Kit
Open Digital Education Software Kitischool webboard
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...JISC KeepIt project
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories prwheatley
 
C P Doc Rev Story
C P Doc Rev StoryC P Doc Rev Story
C P Doc Rev StoryCp Docrev
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Artneilgrindley
 

Similar to Building Emergent Pre-Ingest Workflows (20)

Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revised
 
Bit Level Preservation
Bit Level PreservationBit Level Preservation
Bit Level Preservation
 
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
 
Wewebu customer success story California Dept. of Public Health
Wewebu customer success story California Dept. of Public HealthWewebu customer success story California Dept. of Public Health
Wewebu customer success story California Dept. of Public Health
 
Tackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaTackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in Archivematica
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
 
Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...
 
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGYCYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
January 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJanuary 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations Presentation
 
Open Digital Education Software Kit
Open Digital Education Software KitOpen Digital Education Software Kit
Open Digital Education Software Kit
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
354 ch1
354 ch1354 ch1
354 ch1
 
Resource space
Resource spaceResource space
Resource space
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...
 
Saadallah vtls
Saadallah vtlsSaadallah vtls
Saadallah vtls
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
 
C P Doc Rev Story
C P Doc Rev StoryC P Doc Rev Story
C P Doc Rev Story
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Art
 

More from Mark Matienzo

To Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameTo Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameMark Matienzo
 
Linked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextLinked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextMark Matienzo
 
Archival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationArchival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationMark Matienzo
 
Findability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingFindability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingMark Matienzo
 
Learning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataLearning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataMark Matienzo
 
EAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LEAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LMark Matienzo
 
Online Presence and Participation
Online Presence and ParticipationOnline Presence and Participation
Online Presence and ParticipationMark Matienzo
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Mark Matienzo
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
How I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataHow I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataMark Matienzo
 

More from Mark Matienzo (11)

To Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameTo Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to Name
 
Linked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextLinked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival Context
 
Archival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationArchival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an Iteration
 
Findability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingFindability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through Linking
 
Learning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataLearning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing Metadata
 
EAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LEAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-L
 
Online Presence and Participation
Online Presence and ParticipationOnline Presence and Participation
Online Presence and Participation
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
How I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataHow I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadata
 

Recently uploaded

Top 10 Wealthiest People In The World.pdf
Top 10 Wealthiest People In The World.pdfTop 10 Wealthiest People In The World.pdf
Top 10 Wealthiest People In The World.pdfauroraaudrey4826
 
Brief biography of Julius Robert Oppenheimer
Brief biography of Julius Robert OppenheimerBrief biography of Julius Robert Oppenheimer
Brief biography of Julius Robert OppenheimerOmarCabrera39
 
Opportunities, challenges, and power of media and information
Opportunities, challenges, and power of media and informationOpportunities, challenges, and power of media and information
Opportunities, challenges, and power of media and informationReyMonsales
 
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victory
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep VictoryAP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victory
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victoryanjanibaddipudi1
 
Referendum Party 2024 Election Manifesto
Referendum Party 2024 Election ManifestoReferendum Party 2024 Election Manifesto
Referendum Party 2024 Election ManifestoSABC News
 
Quiz for Heritage Indian including all the rounds
Quiz for Heritage Indian including all the roundsQuiz for Heritage Indian including all the rounds
Quiz for Heritage Indian including all the roundsnaxymaxyy
 
Manipur-Book-Final-2-compressed.pdfsal'rpk
Manipur-Book-Final-2-compressed.pdfsal'rpkManipur-Book-Final-2-compressed.pdfsal'rpk
Manipur-Book-Final-2-compressed.pdfsal'rpkbhavenpr
 
IndiaWest: Your Trusted Source for Today's Global News
IndiaWest: Your Trusted Source for Today's Global NewsIndiaWest: Your Trusted Source for Today's Global News
IndiaWest: Your Trusted Source for Today's Global NewsIndiaWest2
 
complaint-ECI-PM-media-1-Chandru.pdfra;;prfk
complaint-ECI-PM-media-1-Chandru.pdfra;;prfkcomplaint-ECI-PM-media-1-Chandru.pdfra;;prfk
complaint-ECI-PM-media-1-Chandru.pdfra;;prfkbhavenpr
 
57 Bidens Annihilation Nation Policy.pdf
57 Bidens Annihilation Nation Policy.pdf57 Bidens Annihilation Nation Policy.pdf
57 Bidens Annihilation Nation Policy.pdfGerald Furnkranz
 
Rohan Jaitley: Central Gov't Standing Counsel for Justice
Rohan Jaitley: Central Gov't Standing Counsel for JusticeRohan Jaitley: Central Gov't Standing Counsel for Justice
Rohan Jaitley: Central Gov't Standing Counsel for JusticeAbdulGhani778830
 
VIP Girls Available Call or WhatsApp 9711199012
VIP Girls Available Call or WhatsApp 9711199012VIP Girls Available Call or WhatsApp 9711199012
VIP Girls Available Call or WhatsApp 9711199012ankitnayak356677
 
Global Terrorism and its types and prevention ppt.
Global Terrorism and its types and prevention ppt.Global Terrorism and its types and prevention ppt.
Global Terrorism and its types and prevention ppt.NaveedKhaskheli1
 

Recently uploaded (13)

Top 10 Wealthiest People In The World.pdf
Top 10 Wealthiest People In The World.pdfTop 10 Wealthiest People In The World.pdf
Top 10 Wealthiest People In The World.pdf
 
Brief biography of Julius Robert Oppenheimer
Brief biography of Julius Robert OppenheimerBrief biography of Julius Robert Oppenheimer
Brief biography of Julius Robert Oppenheimer
 
Opportunities, challenges, and power of media and information
Opportunities, challenges, and power of media and informationOpportunities, challenges, and power of media and information
Opportunities, challenges, and power of media and information
 
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victory
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep VictoryAP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victory
AP Election Survey 2024: TDP-Janasena-BJP Alliance Set To Sweep Victory
 
Referendum Party 2024 Election Manifesto
Referendum Party 2024 Election ManifestoReferendum Party 2024 Election Manifesto
Referendum Party 2024 Election Manifesto
 
Quiz for Heritage Indian including all the rounds
Quiz for Heritage Indian including all the roundsQuiz for Heritage Indian including all the rounds
Quiz for Heritage Indian including all the rounds
 
Manipur-Book-Final-2-compressed.pdfsal'rpk
Manipur-Book-Final-2-compressed.pdfsal'rpkManipur-Book-Final-2-compressed.pdfsal'rpk
Manipur-Book-Final-2-compressed.pdfsal'rpk
 
IndiaWest: Your Trusted Source for Today's Global News
IndiaWest: Your Trusted Source for Today's Global NewsIndiaWest: Your Trusted Source for Today's Global News
IndiaWest: Your Trusted Source for Today's Global News
 
complaint-ECI-PM-media-1-Chandru.pdfra;;prfk
complaint-ECI-PM-media-1-Chandru.pdfra;;prfkcomplaint-ECI-PM-media-1-Chandru.pdfra;;prfk
complaint-ECI-PM-media-1-Chandru.pdfra;;prfk
 
57 Bidens Annihilation Nation Policy.pdf
57 Bidens Annihilation Nation Policy.pdf57 Bidens Annihilation Nation Policy.pdf
57 Bidens Annihilation Nation Policy.pdf
 
Rohan Jaitley: Central Gov't Standing Counsel for Justice
Rohan Jaitley: Central Gov't Standing Counsel for JusticeRohan Jaitley: Central Gov't Standing Counsel for Justice
Rohan Jaitley: Central Gov't Standing Counsel for Justice
 
VIP Girls Available Call or WhatsApp 9711199012
VIP Girls Available Call or WhatsApp 9711199012VIP Girls Available Call or WhatsApp 9711199012
VIP Girls Available Call or WhatsApp 9711199012
 
Global Terrorism and its types and prevention ppt.
Global Terrorism and its types and prevention ppt.Global Terrorism and its types and prevention ppt.
Global Terrorism and its types and prevention ppt.
 

Building Emergent Pre-Ingest Workflows

  • 1. fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival Records using Open Source Forensic Software Mark A. Matienzo, Yale University Library Code4lib 2011 mark@matienzo.org http://matienzo.org/ @anarchivist
  • 2. Disclaimer The following presentation expresses opinions of my own and not of my employer, my coworkers, etc.
  • 3. Digital forensics? http://www.flickr.com/photos/freeparking/480863346/
  • 6. Key Works urn:isbn:978-0201634976 urn:isbn:978-0321268174 urn:isbn:978-1932326376
  • 8. Design Principles •Use digital forensics software and methodology to support accessioning of born-digital archival records •Mitigate risk of media deterioration and obsolescence •Prefer open source solutions whenever possible •Integrate into a larger, but yet-to-be-defined workflow •Use curation micro-services a guiding philosophy for implementation and further analysis
  • 9. Applied Methodology •Use Carrier’s (2005) model of the digital investigation process: Preservation Searching Reconstruction •Volume and file system as main areas for analysis •Assume much of the state is already lost •Methods should approach or intend forensic soundness •Ethical issues (as raised in CLIR report) are out of scope
  • 11. A Larger Workflow Phys. format inventory Detailed inventory Create desc metadata Enforce restrictions Document rcrdkeeping Disk imaging Intellectual/phys. arr. Add supplemental systems File format survey Lump/split dig.objects desc./annotations Draft submission agmt Assess access reqmts Declare rights/access Use viewers as needed Gather contextual info Complete submission restrictions Log use/access agreement w/ source Develop access tools Assess preserv. needs Migrate selected recs Collection Selection Selection Arrangement & Selection Accessioning Access development description Preservation Migrate objects/files JHOVE/DROID reports Checksum verification Emulation/virt. env. Physical & network media Preservation Dissemination Working storage storage storage
  • 12. Open Source Forensics •Digital forensics is a high-stakes market •Proprietary forensics software is not easily extensible •Proprietary forensics software is often platform-specific •Cultural heritage institutions are still an emerging market for digital forensics •Our needs are different and still being defined
  • 13. Microservices as Philosophy http://www.flickr.com/photos/gregmote/2797330534
  • 14. Principles Preferences Practices • Granularity • Small and simple over • Define, decompose, large and complex recurse • Orthogonality • Minimally sufficient over • Top down design, bottom feature-laden up implementation • Parsimony • Configurable over the • Code to interfaces prescribed • Evolution • The proven over the • Sufficiency through a merely novel series of incrementally necessary steps • Outcomes over means UC Curation Center/California Digital Library, “UC3 Curation Foundations” https://confluence.ucop.edu/download/attachments/13860983/UC3-Foundations-latest.pdf
  • 15. Accessioning Workflow Start accessioning Write-protect media Verify image process Media Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata metadata Transfer package Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 17. Disk Imaging •dd: creates raw images •Related implementations: dc3dd, dcfldd, dd_rescue •Fast, but no mechanism to store imaging metadata •Advanced Forensic Format (AFF)/AFFLib •Cross platform, reasonably fast •Can store arbitrary metadata •Plenty of GUI-based imaging tools
  • 18. AFF File Structure Garfinkel et al. 2006 (http://nrs.harvard.edu/urn-3:HUL.InstRepos:2829932), p. 27
  • 19. Accessioning Workflow Start accessioning Write-protect media Verify image F process Media AF Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data F media as metadata metadata Transfer package AF Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest F AF Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 20. The Sleuth Kit •Open source C library, command line tools, and browser- based Perl application (Autopsy) for forensic analysis •Supports analysis of NTFS, FAT, HFS+, Ext2/3, UFS1/2 •Splits tools into layers: volume system, file system, file name, metadata, data unit (“block”) •Additional utilities to sort and post-process extracted metadata
  • 21. Extracting Metadata & Files $ fls -m A: -a -f fat 2004-M-008.0018.aff 0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114 0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226 0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ fls -m A: -a -f fat 2004-M-008.0018.aff | mactime Wed Dec 31 1969 19:00:00 202152 ..c. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) 202119 ..c. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 ..c. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Thu Feb 22 2001 00:00:00 202152 .a.. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:30:52 202152 m... r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:31:54 202152 ...b r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Fri Feb 23 2001 16:17:18 205607 m... r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Fri Feb 23 2001 16:17:34 202119 m... r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:26 202119 ...b r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:37 205607 ...b r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Tue Sep 21 2010 00:00:00 202119 .a.. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 .a.. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd $ icat 2004-M-008.0018.aff 4 | fido.sh - OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN" $ icat 2004-M-008.018.aff 4 > publist.wpd
  • 22. Extracting Metadata & Files $ fls -m A: -a -f fat 2004-M-008.0018.aff 0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114 0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226 0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime Wed Dec 31 1969 19:00:00 202152 ..c. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) 202119 ..c. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 ..c. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Thu Feb 22 2001 00:00:00 202152 .a.. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:30:52 202152 m... r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:31:54 202152 ...b r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Fri Feb 23 2001 16:17:18 205607 m... r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Fri Feb 23 2001 16:17:34 202119 m... r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:26 202119 ...b r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:37 205607 ...b r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Tue Sep 21 2010 00:00:00 202119 .a.. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 .a.. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd $ icat 2004-M-008.0018.aff 4 | fido.sh - OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN" $ icat 2004-M-008.018.aff 4 > publist.wpd
  • 23. Extracting Metadata & Files $ fls -m A: -a -f fat 2004-M-008.0018.aff 0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114 0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226 0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime Wed Dec 31 1969 19:00:00 202152 ..c. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) 202119 ..c. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 ..c. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Thu Feb 22 2001 00:00:00 202152 .a.. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:30:52 202152 m... r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:31:54 202152 ...b r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Fri Feb 23 2001 16:17:18 205607 m... r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Fri Feb 23 2001 16:17:34 202119 m... r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:26 202119 ...b r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:37 205607 ...b r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Tue Sep 21 2010 00:00:00 202119 .a.. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 .a.. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd $ icat 2004-M-008.0018.aff 4 | fido.sh - OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN" $ icat 2004-M-008.018.aff 4 > publist.wpd
  • 24. Extracting Metadata & Files $ fls -m A: -a -f fat 2004-M-008.0018.aff 0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114 0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226 0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime Wed Dec 31 1969 19:00:00 202152 ..c. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) 202119 ..c. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 ..c. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Thu Feb 22 2001 00:00:00 202152 .a.. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:30:52 202152 m... r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:31:54 202152 ...b r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Fri Feb 23 2001 16:17:18 205607 m... r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Fri Feb 23 2001 16:17:34 202119 m... r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:26 202119 ...b r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:37 205607 ...b r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Tue Sep 21 2010 00:00:00 202119 .a.. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 .a.. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd $ icat 2004-M-008.0018.aff 4 | fido.sh - OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN" $ icat 2004-M-008.018.aff 4 > publist.wpd
  • 25. Extracting Metadata & Files $ fls -m A: -a -f fat 2004-M-008.0018.aff 0|A:/_ublist1.wpd (deleted)|3|r/rrwxrwxrwx|0|0|202152|982818000|982881052|0|982881114 0|A:/publist.wpd|4|r/rrwxrwxrwx|0|0|202119|1285041600|982963054|0|982963226 0|A:/publistwkg.wpd|7|r/rrwxrwxrwx|0|0|205607|1285041600|982963038|0|982963237 0|A:/$MBR|45779|v/v---------|0|0|512|0|0|0|0 0|A:/$FAT1|45780|v/v---------|0|0|4608|0|0|0|0 0|A:/$FAT2|45781|v/v---------|0|0|4608|0|0|0|0 0|A:/$OrphanFiles|45782|d/d---------|0|0|0|0|0|0|0 $ fls -m A: -a -f fat ~/Desktop/2004-M-008/data/2004-M-008.0018.aff | mactime Wed Dec 31 1969 19:00:00 202152 ..c. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) 202119 ..c. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 ..c. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Thu Feb 22 2001 00:00:00 202152 .a.. r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:30:52 202152 m... r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Thu Feb 22 2001 17:31:54 202152 ...b r/rrwxrwxrwx 0 0 3 A:/_ublist1.wpd (deleted) Fri Feb 23 2001 16:17:18 205607 m... r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Fri Feb 23 2001 16:17:34 202119 m... r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:26 202119 ...b r/rrwxrwxrwx 0 0 4 A:/publist.wpd Fri Feb 23 2001 16:20:37 205607 ...b r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd Tue Sep 21 2010 00:00:00 202119 .a.. r/rrwxrwxrwx 0 0 4 A:/publist.wpd 205607 .a.. r/rrwxrwxrwx 0 0 7 A:/publistwkg.wpd $ icat 2004-M-008.0018.aff 4 | fido.sh - OK,110,x-fmt/44,WordPerfect for MS-DOS/Windows Document,WordPerfect for Windows 6.x - 12,202119,"STDIN" $ icat 2004-M-008.018.aff 4 > publist.wpd
  • 26. Accessioning Workflow Start accessioning Write-protect media Verify image K process Media TS Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata K metadata Transfer package TS Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 27. fiwalk •C++ program with Python module for processing images •Outputs results in plain text key/value pairs, XML, CSV, or ARFF (for Weka data mining software) •Developed to support automated forensic processing by breaking it into three steps: extract, represent, process •Pluggable file-level metadata extraction (expects key/ value pairs) •Makes development easy and fast
  • 28. Sample fiwalk Output <?xml version='1.0' encoding='UTF-8'?> <fiwalk xmloutputversion='0.3'> <metadata> <!-- metadata about the disk image --> <creator> <!-- fiwalk provenance metadata (runtime environment, etc.) --></creator> <source> <image_filename>2004-M-008.dd-0018.001</image_filename> </source> <!-- fs start: 0 --> <volume offset='0'> <!-- volume metadata --> <fileobject> <filename>_ublist1.wpd</filename> <!-- more metadata about specific files within the image --> </fileobject> <fileobject/><!-- one for each file --> </volume> <runstats> <!-- runtime statistics --> </runstats> </fiwalk>
  • 29. Sample fiwalk Output <fileobject> <filename>_ublist1.wpd</filename> <partition>1</partition> <id>1</id> <name_type>r</name_type> <filesize>202152</filesize> <unalloc>1</unalloc> <used>1</used> <inode>3</inode> <meta_type>1</meta_type> <mode>511</mode> <nlink>0</nlink> <uid>0</uid> <gid>0</gid> <mtime>982881052</mtime> <atime>982818000</atime> <crtime>982881114</crtime> <libmagic>(Corel/WP)</libmagic> <byte_runs> <run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/> </byte_runs> <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest> <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest> </fileobject>
  • 30. Accessioning Workflow Start accessioning Write-protect media Verify image process Media Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata al k metadata Transfer package fiw Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 31. Why fiwalk? •Faster (and more forensically sound) to extract metadata once rather than having to keep processing an image •Develop better assessments during accessioning process (directory structure significant? timestamps accurate?) •fiwalk’s output is something like a METS structMap •Building non-invasive assessment tools takes less time
  • 33. Gumshoe •Prototype application •Blacklight (Ruby on Rails + Solr) & Python indexing code •Indexing code works with fiwalk output or directly over a disk image (using fiwalk’s Python bindings) •Populates Solr index with all file-level metadata from fiwalk and text strings extracted from files •Code at http://github.com/anarchivist/gumshoe •Demo at http://xgumshoex.heroku.com/
  • 35. AFF4 •Emerging format, with tools still under development •Better for a distributed environment •Friendlier to micro-services philosophy •Clearer object and metadata model (RDF-based) •Container format is Zip64 •Cohen and Schatz 2010 (doi:10.1016/j.diin.2010.05.015) show hash based imaging as a more efficient alternative
  • 36. Sample AFF4 Metadata @prefix D1: <urn:aff4:652e4027-27fab2941> @prefix G1: <urn:aff4:19857a87-a190b2f87> @prefix G2: <urn:aff4:0a1fc78a-927bfacef> @prefix S1: <urn:aff4:652e4027-ffff01199> @prefix I1: <urn:aff4:9003027a-11199ffff> @prefix aff4: <http://afflib.org/2009/aff4/#> @prefix dcterms: <http://purl.org/dc/terms/> G1: { D1: aff4:serialNumber "zx322o91" D1: aff4:hash "3897450fa18094b13"^^aff4:md5 } G2: { S1: aff4:name "aff4imager" S1: aff4:vendor <http://aff.org/> S1: aff4:asserts G1: S1: aff4:type aff4:AcquisitionTool. S1: aff4:version "0.2" I1: aff4:type aff4:Image. I1: aff4:hash "3897450fa18094b13"^^aff4:md5 I1: dcterms:creator S1: } Adapted from Schatz and Cohen 2010 (doi:10.1007/978-3-642-15506-2_16), pp. 234-236
  • 37. Accessioning Workflow Start accessioning Write-protect media Verify image F 4 process Media AF Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata metadata Transfer package F 4 AF Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest F 4 F 4 F 4 AF AF AF Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 38. Further Toolsets http://www.flickr.com/photos/oskay/5369749968/
  • 39.
  • 40. Thank You mark@matienzo.org http://matienzo.org/ twitter: @anarchivist This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.