NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books

NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books






Total Views
Views on SlideShare
Embed Views



2 Embeds 354 331 23



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books Presentation Transcript

  • Understanding Critical Elements of E- books: Acquiring, Sharing, and Preserving Part 2: Heritage Lost? Ensuring the Preservation of E-books May 23, 2012Speakers: Jeremy York and Sheila Morrissey
  • HATHITRUST! A Shared Digital Repository!We’re  Preserving  the  Past,  What  About  the  Present?   NISO  Webinar:  Ensuring  the  Preserva;on  of  E-­‐Books   May  23,  2012   Jeremy  York,  Project  Librarian,  HathiTrust  
  • Outline  •  About  HathiTrust  •  Preserva;on  and  Access  Strategies  •  What  about  the  present?  
  • Partnership  Arizona State University North Carolina State University of ConnecticutBaylor University University University of FloridaBoston College Northwestern University University of IllinoisBoston University The Ohio State University University of Illinois at ChicagoCalifornia Digital Library The Pennsylvania State The University of IowaColumbia University University Princeton University University of MarylandCornell UniversityDartmouth College Purdue University University of MiamiDuke University Stanford University University of MichiganEmory University Texas A&M University University of MinnesotaFlorida State University Universidad Complutense University of MissouriGetty Research Institute de Madrid University of Nebraska-LincolnHarvard University Library University of Arizona The University of NorthIndiana University University of Calgary Carolina at ChapelJohns Hopkins University University of California HillLafayette College Berkeley Davis University of Notre DameLibrary of CongressMassachusetts Institute of Irvine University of Pennsylvania Technology Los Angeles University of PittsburghMcGill University` Merced University of UtahMichigan State University Riverside University of VirginiaNew York Public Library San Diego University of WashingtonNew York University San Francisco University of Wisconsin-North Carolina Central Santa Barbara Madison University Santa Cruz   Utah State University The University of Chicago Washington University Yale University Library
  • The  Name  •  The  meaning  behind  the  name   –  Hathi  (hah-­‐tee)-­‐-­‐Hindi  for  elephant   –  Big,  strong   –  Never  forgets,  wise   –  Secure   –  Trustworthy  
  • Strategic   Advisory   Board   Guidance  on   •  12-­‐member  Board  of   Policy,  Planning   Governors   Execu;ve   CommiVee   •  Execu;ve  CommiVee   •  Execu;ve  Director  Budget/Finances  Decision-­‐making   HathiTrust  
  • Digital  Repository  •  Launched  2008  •  Ini;al  focus  on  digi;zed  book  and  journal   content   –  10,309,742  total  volumes     –  5,464,306  book  ;tles   –  271,119  serial  ;tles   –  3,001,018  public  domain  (~29%)  •  “Light”  archive  
  • Collec;ons  and  Collabora;on  •  Comprehensive  collec;on   -  Preserva;on…with  Access  •  Shared  strategies   –  Copyright   –  Collec;on  management,  development   –  Preserva;on   –  Discovery  /  Use   –  Bibliographic  Indeterminacy   –  Efficient  user  services  •  Public  Good  
  • Preserva;on  and   Access  
  • Repository  Philosophy/Design  •  OAIS/TRAC  •  Consistency  •  Standardiza;on  •  Simplicity  (in  design,  not  func;on)  •  Prac;cality  •  Sustainability  
  • What  about  the   Present?  
  • Dates   Collec;ons  Languages   La;n   Remaining   Arabic   1%   Languages   2%   14%   Italian   Japanese   3%   3%   Russian   English   4%   48%   Chinese   4%   Spanish   5%   French   7%   German   9%  
  • To  contribute  to  the  common  good  by  collec;ng,  organizing,  preserving,  communica(ng,  and  sharing  the  record  of  human  knowledge  
  • •  Rights  holders  open  access    •  Publishers  deposit  master  files  •  Publish  directly  into  the  repository  
  • jPach:  Journal  Publishing  in  HathiTrust  •  hVp://  •  Package  of  tools  to  enable  publica;on  of  open   access  journals  •  Includes  modifica;ons  to  exis;ng  code  base;   new  components  to  facilitate  ingest,  display,   and  discoverability  of  born-­‐digital  open-­‐access   journal  literature  •  Allow  integra;on  with  popular  journal   publishing  tools  such  as  Open  Journal  Systems   (OJS)  
  • Key  Elements  •  Openness   –  Content  must  be  licensed  for  perpetual  open  access  •  Addi;onal  formats   –  Fixity  of  bitstream  guaranteed  where  preserva;on   specifica;ons  cannot  be  developed  •  Allow  download  of  content  not  rendered  in  the   interface  •  Support  ar;cles  and  contextual  informa;on  (lists   of  editors,  submission  requirements)  •  Support  for  revisions  to  content  
  • Publishing  into  the   Repository  
  • Higher  Educa;on   Source  /  Editorial   Market   Archive  
  • Publishing  into  the  Repository  •  Openness   –  Con;nual  stewardship  and  access  •  Sustainability   –  Library  as  engine  of  communica;on  
  • How  to  find  out  more  •  About:  hVp://  •  TwiVer:  hVp://  •  Facebook:  hVp://  •  Monthly  newsleVer:     –   –  RSS  hVp://  •  Contact  us:  •  Blogs:  hVp://   –  Large-­‐scale  Search   –  Perspec;ves  from  HathiTrust  
  • Thank  you  very  much!  
  • File Format Considerations in the Preservation of e-Books Sheila Morrissey Senior Research Developer, Portico NISO Webinar: Heritage Lost? Ensuring the Preservation of E-books May 23, 1012
  • Portico - Third Party Preservation Portico is among the largest community- supported digital archives in the world. Working with libraries, publishers, and funders, we preserve e- journals, e-books, and other electronic scholarly content to ensure researchers and students will have access to it in the future.
  • Portico - Participating Content Over 2,000 societies, and associations have committed content to Portico through 147 publishers agreements. Committed Content »  E-journal titles 13,675 »  E-book titles 129,781 »  D-collections 46
  • Portico – Preserved Content Preserved Content »  E-journal titles 9,568 »  E-book titles 16,861 »  D-collections 12 »  Archival Units 19,433,869 »  Preserved Files 319,737,011
  • Portico - Audit and Certification In 2010, Portico became the first digital preservation service to be independently audited by the Center for Research Libraries (CRL) and subsequently certified as a trusted, reliable digital preservation solution that serves the needs of the library community.
  • Portico - History 2006 2009 2002 Portico PorticoLaunch of ingests ingestsElectronic initial e- initial e- 2009Archiving journal book CRL Initiative content content audit of by into the into the Portico JSTOR archive archive begins 2005 2007 2009 2010 Portico Portico Portico Portico Launched makes fulfills first ingests first PCA initial d- trigger claim collection title content available
  • Digital Preservation Digital preservation is the series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long-term. The key goals of digital preservation include: Usability Authenticity Discoverability Accessibility •  the intellectual •  the provenance of •  the content must •  the content must be content of the item the content must be have logical available for use to must remain usable proven and the bibliographic the appropriate via the delivery content an authentic metadata so that it community mechanism of replica of the can be found by end current technology original users through time
  • Preservation: Legal aspects Legal right to preserve content »  Not always the same as access rights »  Specified in contracts »  Includes embedded or supplemental files, such as images »  DRM removed
  • Usability - Preserve Intellectual Content
  • Usability - Preserve Intellectual Content
  • Usability: Rendition and Delivery Content is rendered to support current delivery platform, i.e. web browser. … rendered & delivered … Rendition engine can be modified to meet new technology requirements.
  • Portico – Another Look at the History 2009 2011 2006 iPad 2 Portico 2002 Portico ingests KindleLaunch of ingests initial e- FireElectronic initial e- book NookArchiving journal content Simple Initiative content Touch by into the Kindle 2 JSTOR archive Nook ePub3 2005 2007 2010 2012 Portico Portico iPad 1 Portico Launched makes Nook ingests first Color initial d- trigger collection title content available iPad 3 iPhone Kindle 1
  • Usability: Anticipated usage …
  • Usability: … and new usage
  • Authenticity, Discoverability:Preservation Context
  • Context
  • Context
  • Context
  • Context
  • Context
  • Context
  • ...
  • Formats: Packages
  • Formats: Packages
  • Formats: Packages
  • E-Book Packages in Portico Submissions Flat directory »  ONIX xml file with bibliographic metadata, one PDF file per book   Front Cover image JPG files
  • E-Book Packages in Portico Submissions TAR file (multiple books per file) »  XML manifest file »  One directory for each book,   Proprietary XML file (3 possible versions of XML) with bibliographic metadata,   Subdirectory with files for front matter “chapters” (XML. PDF, OCR of PDF)   Subdirectory with files for regular “chapters” (XML. PDF, OCR of PDF) front   Subdirectory with files for back matter “chapters” (XML. PDF, OCR of PDF)   Subdirectory with TIFF file for cover image of book
  • E-Book Packages in Portico Submissions ZIP file (sometimes one book per file, sometime multiple books) »  Sometimes flat (all books at one level) »  Sometimes one directory for each book,   Sometimes cover images (JPG or TIFF)   Sometimes one PDF for entire book in addition to PDF for each chapter »  Sometimes a manifest
  • Formats: Text Content Hello,  World!!  
  • Formats: Text Content BT /H2 <</MCID 0 >>BDC Hello,  World!!   /CS0 cs 0.31 0.506 0.741 scn /TT0 1 Tf -0.004 Tc 0.006 Tw 12.96 0 0 12.96 72 697.68 Tm [(H)-4(e)-1(l)-1(l)-11 (o,)-3( W)-15(or)-6 (l)-11(d!)-12(!)]TJ 0 Tc 0 Tw 6.481 0 Td ( )Tj EMC ET
  • Formats: Text Content <html> <head> Hello,  World!!   <style type="text/css"> <!-- p { color: #4F81BD; font-family: serif; font-weight: bold; font-size: 13pt; } --> </style> </head> <body><p>Hello, World!! </p></body> </html>
  • Trade-offs: Expressiveness vs. Simplicity Hello,  World!!  
  • Formats: Rich Content Hello,  World!!  
  • Formats: Rich Content BT Hello,  World!!   /H2 <</MCID 0 >>BDC /CS0 cs 0.31 0.506 0.741 scn /TT0 1 Tf -0.004 Tc 0.006 Tw 12.96 0 0 12.96 264 697.68 Tm [(H)-4(e)-1(l)-2(l)-11(o,)-3( W)-15(or)-6 (l)-11(d!)-12(!)]TJ 0 Tc 0 Tw 6.481 0 Td ( )Tj EMC /P <</MCID 1 >>BDC /CS1 cs 0 scn /TT1 1 Tf 11.04 0 0 11.04 72 682.08 Tm ( )Tj EMC /P <</MCID 2 >>BDC 36.478 -24.185 Td ( )Tj EMC ET /Figure <</MCID 3 >>BDC q /GS0 gs 336 0 0 252 139.1000061 414.6812744 cm /Im0 Do Q EMC
  • Formats: Rich Content Hello,  World!!   (iText RUPS)
  • Formats: Rich Content <html> <head> <style type="text/css"> Hello,  World!!   <!-- p { color: #4F81BD; font-family: serif; font-weight: bold; font- size: 13pt; }--> </style> </head> <body><p>Hello, World!! <br/><span><IMG width="447" height="336" src=“images/ Image_001.jpg"/></ span></p></body> </html>
  • Trade-offs: Encapsulation vs. Articulation mydir/ myFile.pdf mydir/ myFile.html images/ Image01.jpg
  • E-book formats in Portico Submissions PDF »  One file per chapter »  One file per book TIFF »  One file per page JPEG »  One file per page XML »  For bibliographic metadata »  Proprietary »  ONIX variants »  NLM variants
  • Looking ahead: EPUB 3 EPUB 3 ( ) »  “EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content-- including HTML5, CSS, SVG, images, and other resources-- for distribution in a single-file format.”
  • Looking ahead: EPUB 3 EPUB 3 »  Web standards for key component technologies »  Free and open specification »  Must work in at least some appliance   Outside publisher’s own workflow
  • EPUB3 Packaging
  • EPUB3 Formats “Profiles” of standard formats for authoring content »  XHTML5, SVG 1.1, CSS 2.1, CSS 3   Constraints (extensions to HTML5, constraints on SVG)   Specs a “moving target” Conforming readers must support rendition of certain formats »  Image, audio, video   Defined fallbacks Globalization, Encoding, Fonts
  • Complications: The New “Browser Wars” Amazon »  Announces it is replacing MOBI with K8 iBooks »  Different mimetype »  Proprietary extension of CSS Media Queries »  Proprietary XML namespace »  Etc.
  • Complications: "More What You’d Call ‘Guidelines’Than Actual Rules” Pirates of the Caribbean: The Black Pearl. The Walt Disney Company (2003)
  • Questions or Comments? Sheila @sheilaMorr