Origin: It started with a simple needAs the Library of Congress began to deal with increasing amounts of digital content,they faced some issues: • How do they know what files they have and who they belong to? • How do they get ﬁles from where they are to where they need to be?The Library of Congress Repository Development Center began working on a solution--tools for transfer activities including: • Adding digital content to the collections (whether internal or external data) • Moving digital content between storage systems • Review of digital ﬁles for ﬁxity, quality and/or authoritativeness • Inventorying and recording transfer life cycle events for digital ﬁles
Origin: It evolved naturally from that needHere is what Leslie Johnson (Library of Congress contributor) and John Kunze (California Digital Libraryco-creator) shared about the project’s origin:
Origin: But what is it exactly?• The name comes from the concept of "bag it and tag it”. BagIt allows for the transfer of digital files by packaging them into a digital “bag” that is accessible for the library to download.• A bag is like a folder or directory on a computer; it can hold documents, photos, movies, music, or even other folders.• Bags are comprised of three main elements: 1. A bag declaration text file (like a seal of authenticity) 2. A text-file manifest (tag) listing the files in the collection 3. A subdirectory filled with the digital content• A bag can also contain an optional text file with a small amount of administrative metadata (e.g. contact info for the collection owner and a description of the collection)• Once a bag is sent, the receiving computer can analyze the manifest and run checksums on the contents; if the checksums match (i.e. the files are unchanged), the transfer is successful.• It’s that simple!
Evolution: Community involvement• Working with John Kunze of the California Digital Library, Andy Boyko, Justin Littman, Liz Madden, and Brian Vargas of the Library produced draft version of BagIt (initially referred to as the “LC Package Speciﬁcation”) in December 2008.• This was posted on the LOC and California Digital Library sites and as an internet “Request for Comment” (RFC).• It was also promoted on blogs, in conference presentations, articles, etc. NDIPP strongly encouraged partners to “bag” their content for transfer.• Through the process, project managers began learning what was still missing and where the specification needed clariﬁcation.• The team then launched a Digital Curation Google group to support the activities of this participatory community and encourage open, public discussion.• BagIt is now on version 0.97, having undergone several iterative revisions (6 drafts to date).
Evolution: Tools• BagIt was intended to be simple enough for users to work with directly. However, the community increasingly began to request tools to help with the use of BagIt, as well as the source code so that they could develop their own further tools.• The LOC developed three initial scripts- key utilities for the movement and validation of bagged content- and released them through SourceForge on December 18, 2008 under a BSD license (essentially open-sourced). These tools have been rather popular with 4,617 downloads to date (31 this week). • The Parallel Retriever: automates the retrieval of remote resources such as web pages, files on an FTP server, or files on a network drive, and then wraps them into a package that meets the BagIt specification. • The Bag Validator Script: checks that a bag meets the standards of the specification (i.e. all files listed in the manifest are in the data directory, there are no files in the directory not in the manifest, and there are no duplicate entries in the manifest) • VerifyIt Script: verifies the checksums of files in a bag against the manifest each time the files are moved or copied.• They later released the BagIt Library (BIL) – a Java library to support key functionality such as creating, manipulating, validating, and verifying Bags, and reading from and writing to a number of formats.• A client-side Bagger application was also underway in 2009. Bagger is intended to provide a graphical desktop for the Bagging of content, and ideally will require no client-side IT support or infrastructure.
Evolution: AdaptationsThe BagIt tool set became the LOC’s first open source software release. Since then, several BagIt specifictools have been created to simplify the process in several programming environments (it was originallydesigned for use with Unix utilities): • Python BagIt Library– at least two recent versions exist for this, one completed by Andrew Hankinson (https://github.com/ahankinson/bagit) and the other by Ed Summer (https://github.com/edsu/bagit). These libraries can be used to create BagIt style packages programmatically in Python or from the command line. • Drupal– Mark Jordan developed a Drupal module for BagIt (http://drupal.org/project/bagit). • Ruby– Francesco Lazzarino at the Florida Center for Library Automation developed a Ruby adaptation for BagIt (https://github.com/tipr/bagit). • PHP– A PHP implementation of BagIt was created by Wayne Graham and Mark Jordan (https://github.com/scholarslab/BagItPHP). • RESTful Bag Storage Proposal- Chris Adams developed this draft protocol for serving BagIt repositories RESTfully (https://github.com/acdha/restful-bag-server).
Practicalities: Where does BagIt fit?“Why are such transfer tools and processes so important? Transfer processes are not surprisinglylinked with preservation, as the tasks performed during the transfer of ﬁles must follow adocumented workﬂow and be recorded in order to mitigate preservation risks... While initialinterest in this problem space came from the need to better manage transfers from externalpartners to the Library, the transfer and transport of ﬁles within the organization for the purposeof archiving, transformation, and delivery is an increasingly large part of daily operations. Thedigitization of an item can create one or hundreds of ﬁles, each of which might have manyderivative versions, and which might reside in multiple locations simultaneously to serve differentpurposes. Developing tools to manage such transfer tasks reduce the number of tasks performedand tracked by humans, and automatically provides for the validation and veriﬁcation of ﬁles witheach transfer event.”-- from “Releasing Open Source at the Library of Congress” by Leslie Johnson
Practicalities: What’s so special about BagIt?• Bags are uncomplicated, and are therefore able to transcend differences in institutional data, data architecture, formats and practices.• Bags have built-in inventory checking (validation) to help ensure that the content is transferred unchanged and fully intact.• Unlike other packaging tools like zip or tar, Bagit does not require special software to extract the files.• Additionally, in these formats, all individual files included are condensed into a single zip or tar file. However, BagIt creates a logical package where files maintain their individuality and are simply stored in a traditional folder or directory container.• There is no limit to the number / type of files that can be transferred through the use of BagIt.• Bags are flexible and can work in many different settings– including situations when the content is located in many different places.• A bag’s metadata is machine readable, meaning that data can be ingested automatically.• Bags can be used over computer networks or through the use of portable storage devices.
Practicalities: Who Is Using BagIt?• As of 2009, a signiﬁcant percentage of the 130 NDIIPP partners were already utilizing the BagIt speciﬁcation in their preservation transfers to the Library.• A few of the organizations who are using BagIt include: The University of Virginia Libraries The Stanford Digital Repository Archivematica Ghent University Library The Dryad Data Repository The University of North Texas Central Connecticut State University Towards Interoperable Preservation Repositories (including the Florida Center for Library Automation, Cornell University, and New York University)
Practicalities: BagIt Usage Highlights• The Stanford Digital Repository: Having had success using BagIt to move geospatial data from the National Geospatial Digital Archive project from Stanford to the Library of Congress, they settled on BagIt as the primary transfer format for content being deposited into their repository (ingest stage of OAIS) (http://www.dlib.org/dlib/september10/cramer/09cramer.html).• Ghent University Library: They currently use BagIt as archival format for their digital collections. They also use it as an interchange format for the addition of new external collections (e.g. Google Books) to the local repositories. http://www.slideshare.net/hochstenbach/grep-ghent-university-repository• The Dryad Data Repository: (a repository of data underlying scientific publications) is using the BagIt specification to share data and related metadata with TreeBASE, a repository of phylogenetic information. http://wiki.datadryad.org/BagIt_Handshaking• Towards Interoperable Preservation Repositories (TIPR): is a partnership between the Florida Center for Library Automation, Cornell University, and New York University to develop, test and promote a standard interchange format for exchanging information packages among OAIS-based repositories. The proposed format is using the BagIt specification to exchange package bundles via HTTP. (http://wiki.fcla.edu:8000/TIPR); (https://github.com/tipr/bagit/)
The Process: Tutorials• The North Carolina State Archives has provided a set of 10 thorough tutorials to explain the BagIt process. The first video includes a summary of the steps involved; the second set explains the installation process; and the third details creation and verification step-by-step: http://www.youtube.com/playlist?list=PL1763D432BE25663D&feature=plcp• The NDIIPP-funded GeoMAPP project has published a BagIt User Guide that can be found at: http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf• The Library of Congress NDIIPP Partner Tools and Services Inventory page includes a brief description of BagIt, a PDF of the latest version of the BagIt specification, links to some of the BagIt tools, and a brief video demonstrating the BagIt process: http://www.digitalpreservation.gov/partners/resources/tools/index.html#b
Four Steps to use BagIt The process is as simple as 1, 2, 3, 4…Prepare Files Create & Copy & Extract Files for Transfer Verify Bag Verify Bag for Use
Image courtesy of the GeoMapp.net BagIt Guidehttp://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf
Prepare files for transfer• A bag must have three things– a bag declaration, a list of the content files (manifest), and the content itself• Validate content and metadata• Perform virus check (suggested)
Create and verify the bag• Attach portable drive to computer (or use shared drive)• Create a new folder to serve as the holding place for your bag• Use the “BagIt” command to create the bag on this drive• Verify the bag by using the “verifyvalid” command
Copy and Verify the bag• Copy the bag to a staging area• Validate the received bag• Run virus check software on the bag
Extract files for use• Unpack the bag• Your files are now ready for use!
Challenges: Limiting Usage Factors• Lack of information: The LOC website contains little information aside from what is included in their brief 3 minute video and short printed description. It’s hard to find much more via outside online sources either. It would be useful to have further example implementations to really understand how it can be used and what the advantages are over other formats such as zip files.• Learning curve: Most of the documentation language is complicated, and would not be easy to understand by the average person. BagIt doesn’t currently have an easy to use GUI interface to make the process simple for non-techie users. Bagger may help with this, but there is little information out there about the Bagger interface.
? And that concludes our tour of BagIT… Any Questions?
Additional Sources"BagIt File Packaging Format." IETF Documents. Internet Engineering Task Force, 15 Apr 2011. Web. 1 Apr 2012. <http://tools.ietf.org/html/draft-kunze-bagit-06>.BagIt: Transferring Content for Digital Preservation. 2009. video. The Library of Congress, Washington, DC. Web. 1 Apr 2012. <http://www.digitalpreservation.gov/multimedia/videos/bagit0609.html>.Johnston, Leslie. "Releasing Open Source at the Library of Congress. "OCLC Systems & Services: International Digital Library Perspectives. 26.2 (2010): 94-102.Johnston, Leslie, and John Kunze. "BagIt funding and versions." 29 Mar 2012. N.p., Online Posting to Digital Curation Google Group. Web. 1 Apr. 2012. <http://groups.google.com/group/digital- curation/browse_thread/thread/ace8eafae819762b?pli=1>.Lavoie, Brian. "The Open Archival Information System Reference Model: Introductory Guide." Technology Watch Report. 04-01 (2004).Lazorchak, Butch. "From There to Here, from Here to There, Digital Content is Everywhere!." The Signal: Digital Preservation. The Library of Congress, 3 Jan 2012. Web. 1 Apr 2012. <http://blogs.loc.gov/digitalpreservation/2012/01/from-there-to-here-from-here-to-there-digital- content-is-everywhere/>.Willett, Perry. "BagIt File Packaging Format." California Digital Library, 10 Feb 2012. Web. 1 Apr 2012. <https://wiki.ucop.edu/display/Curation/BagIt>.