0
http://www.niso.org/news/events/2012/nisowebinars/ebooks_preservation/  Understanding Critical Elements of E-    books: Ac...
HATHITRUST!                          A Shared Digital Repository!We’re	  Preserving	  the	  Past,	  What	  About	  the	  P...
Outline	  •  About	  HathiTrust	  •  Preserva;on	  and	  Access	  Strategies	  •  What	  about	  the	  present?	  
Partnership	  Arizona State University     North Carolina State        University of ConnecticutBaylor University         ...
The	  Name	  •  The	  meaning	  behind	  the	  name	     –  Hathi	  (hah-­‐tee)-­‐-­‐Hindi	  for	  elephant	     –  Big,	 ...
Strategic	                            Advisory	                             Board	                          Guidance	  on	...
Digital	  Repository	  •  Launched	  2008	  •  Ini;al	  focus	  on	  digi;zed	  book	  and	  journal	     content	      – ...
Collec;ons	  and	  Collabora;on	  •  Comprehensive	  collec;on	      -  Preserva;on…with	  Access	  •  Shared	  strategies...
Preserva;on	  and	       Access	  
Repository	  Philosophy/Design	  •  OAIS/TRAC	  •  Consistency	  •  Standardiza;on	  •  Simplicity	  (in	  design,	  not	 ...
What	  about	  the	    Present?	  
Dates	                                                               Collec;ons	  Languages	                          La;n...
To	  contribute	  to	  the	  common	  good	  by	  collec;ng,	  organizing,	  preserving,	  communica(ng,	  and	  sharing	 ...
•  Rights	  holders	  open	  access	  	  •  Publishers	  deposit	  master	  files	  •  Publish	  directly	  into	  the	  re...
jPach:	  Journal	  Publishing	  in	  HathiTrust	  •  hVp://lib.umich.edu/jpach	  •  Package	  of	  tools	  to	  enable	  p...
Key	  Elements	  •  Openness	      –  Content	  must	  be	  licensed	  for	  perpetual	  open	  access	  •  Addi;onal	  fo...
Publishing	  into	  the	     Repository	  
Higher	  Educa;on	                       Source	  /	  Editorial	                              Market	                     ...
Publishing	  into	  the	  Repository	  •  Openness	     –  Con;nual	  stewardship	  and	  access	  •  Sustainability	     ...
How	  to	  find	  out	  more	  •    About:	  hVp://www.hathitrust.org/about	  •    TwiVer:	  hVp://twiVer.com/hathitrust	  ...
Thank	  you	  very	  much!	  
File Format Considerations in the Preservation of e-Books              Sheila Morrissey      Senior Research Developer, Po...
Portico - Third Party Preservation                           Portico is among the largest community-                      ...
Portico - Participating Content                          Over 2,000 societies, and associations have                      ...
Portico – Preserved Content                                       Preserved Content                       »    E-journal t...
Portico - Audit and Certification   In 2010, Portico became   the first digital   preservation service to be   independent...
Portico - History                           2006                     2009   2002                   Portico                ...
Digital Preservation   Digital preservation is the series of management policies and activities   necessary to ensure the ...
Preservation: Legal aspects   Legal right to preserve content      »    Not always the same as access rights      »    Spe...
Usability - Preserve Intellectual Content
Usability - Preserve Intellectual Content
Usability: Rendition and Delivery    Content is rendered to support current delivery      platform, i.e. web browser.     ...
Portico – Another Look at the History                                                    2009                 2011        ...
Usability: Anticipated usage …
Usability: … and new usage
Authenticity, Discoverability:Preservation Context
Context
Context
Context
Context
Context
Context
...
Formats: Packages
Formats: Packages
Formats: Packages
E-Book Packages in Portico Submissions  Flat directory     »  ONIX xml file with bibliographic metadata, one PDF file per ...
E-Book Packages in Portico Submissions  TAR file (multiple books per file)     »  XML manifest file     »  One directory f...
E-Book Packages in Portico Submissions  ZIP file (sometimes one book per file, sometime multiple      books)     »  Someti...
Formats: Text Content               Hello,	  World!!	  
Formats: Text Content  BT  /H2 <</MCID 0 >>BDC      Hello,	  World!!	    /CS0 cs 0.31 0.506  0.741 scn  /TT0 1 Tf  -0.004 ...
Formats: Text Content  <html>  <head>                    Hello,	  World!!	    <style type="text/css">  <!--    p { color: ...
Trade-offs: Expressiveness vs. Simplicity                                   Hello,	  World!!	  
Formats: Rich Content             Hello,	  World!!	  
Formats: Rich Content  BT                                              Hello,	  World!!	    /H2 <</MCID 0 >>BDC  /CS0 cs 0...
Formats: Rich Content                          Hello,	  World!!	             (iText RUPS)
Formats: Rich Content  <html>  <head>  <style type="text/css">                             Hello,	  World!!	    <!--    p ...
Trade-offs: Encapsulation vs. Articulation            mydir/                     myFile.pdf            mydir/             ...
E-book formats in Portico Submissions       PDF          »  One file per chapter          »  One file per book       TIFF ...
Looking ahead: EPUB 3       EPUB 3 (http://idpf.org/epub/30 )           »  “EPUB defines a means of representing,         ...
Looking ahead: EPUB 3       EPUB 3          »  Web standards for key component             technologies          »  Free a...
EPUB3 Packaging
EPUB3 Formats  “Profiles” of standard formats for authoring content     »  XHTML5, SVG 1.1, CSS 2.1, CSS 3           Cons...
Complications: The New “Browser Wars”  Amazon     »  Announces it is replacing MOBI with K8  iBooks     »    Different mim...
Complications: "More What You’d Call ‘Guidelines’Than Actual Rules”                  Pirates of the Caribbean: The Black P...
Questions or  Comments?     Sheila Morrisseysheila.morrissey@ithaka.org       @sheilaMorr     www.portico.org
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books
Upcoming SlideShare
Loading in...5
×

NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books

1,201

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,201
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "NISO Webinar: Understanding Critical Elements of E-books: Part 2: Heritage Lost? Ensuring the Preservation of E-books "

  1. 1. http://www.niso.org/news/events/2012/nisowebinars/ebooks_preservation/ Understanding Critical Elements of E- books: Acquiring, Sharing, and Preserving Part 2: Heritage Lost? Ensuring the Preservation of E-books May 23, 2012Speakers: Jeremy York and Sheila Morrissey
  2. 2. HATHITRUST! A Shared Digital Repository!We’re  Preserving  the  Past,  What  About  the  Present?   NISO  Webinar:  Ensuring  the  Preserva;on  of  E-­‐Books   May  23,  2012   Jeremy  York,  Project  Librarian,  HathiTrust  
  3. 3. Outline  •  About  HathiTrust  •  Preserva;on  and  Access  Strategies  •  What  about  the  present?  
  4. 4. Partnership  Arizona State University North Carolina State University of ConnecticutBaylor University University University of FloridaBoston College Northwestern University University of IllinoisBoston University The Ohio State University University of Illinois at ChicagoCalifornia Digital Library The Pennsylvania State The University of IowaColumbia University University Princeton University University of MarylandCornell UniversityDartmouth College Purdue University University of MiamiDuke University Stanford University University of MichiganEmory University Texas A&M University University of MinnesotaFlorida State University Universidad Complutense University of MissouriGetty Research Institute de Madrid University of Nebraska-LincolnHarvard University Library University of Arizona The University of NorthIndiana University University of Calgary Carolina at ChapelJohns Hopkins University University of California HillLafayette College Berkeley Davis University of Notre DameLibrary of CongressMassachusetts Institute of Irvine University of Pennsylvania Technology Los Angeles University of PittsburghMcGill University` Merced University of UtahMichigan State University Riverside University of VirginiaNew York Public Library San Diego University of WashingtonNew York University San Francisco University of Wisconsin-North Carolina Central Santa Barbara Madison University Santa Cruz   Utah State University The University of Chicago Washington University Yale University Library
  5. 5. The  Name  •  The  meaning  behind  the  name   –  Hathi  (hah-­‐tee)-­‐-­‐Hindi  for  elephant   –  Big,  strong   –  Never  forgets,  wise   –  Secure   –  Trustworthy  
  6. 6. Strategic   Advisory   Board   Guidance  on   •  12-­‐member  Board  of   Policy,  Planning   Governors   Execu;ve   CommiVee   •  Execu;ve  CommiVee   •  Execu;ve  Director  Budget/Finances  Decision-­‐making   HathiTrust  
  7. 7. Digital  Repository  •  Launched  2008  •  Ini;al  focus  on  digi;zed  book  and  journal   content   –  10,309,742  total  volumes     –  5,464,306  book  ;tles   –  271,119  serial  ;tles   –  3,001,018  public  domain  (~29%)  •  “Light”  archive  
  8. 8. Collec;ons  and  Collabora;on  •  Comprehensive  collec;on   -  Preserva;on…with  Access  •  Shared  strategies   –  Copyright   –  Collec;on  management,  development   –  Preserva;on   –  Discovery  /  Use   –  Bibliographic  Indeterminacy   –  Efficient  user  services  •  Public  Good  
  9. 9. Preserva;on  and   Access  
  10. 10. Repository  Philosophy/Design  •  OAIS/TRAC  •  Consistency  •  Standardiza;on  •  Simplicity  (in  design,  not  func;on)  •  Prac;cality  •  Sustainability  
  11. 11. What  about  the   Present?  
  12. 12. Dates   Collec;ons  Languages   La;n   Remaining   Arabic   1%   Languages   2%   14%   Italian   Japanese   3%   3%   Russian   English   4%   48%   Chinese   4%   Spanish   5%   French   7%   German   9%  
  13. 13. To  contribute  to  the  common  good  by  collec;ng,  organizing,  preserving,  communica(ng,  and  sharing  the  record  of  human  knowledge  
  14. 14. •  Rights  holders  open  access    •  Publishers  deposit  master  files  •  Publish  directly  into  the  repository  
  15. 15. jPach:  Journal  Publishing  in  HathiTrust  •  hVp://lib.umich.edu/jpach  •  Package  of  tools  to  enable  publica;on  of  open   access  journals  •  Includes  modifica;ons  to  exis;ng  code  base;   new  components  to  facilitate  ingest,  display,   and  discoverability  of  born-­‐digital  open-­‐access   journal  literature  •  Allow  integra;on  with  popular  journal   publishing  tools  such  as  Open  Journal  Systems   (OJS)  
  16. 16. Key  Elements  •  Openness   –  Content  must  be  licensed  for  perpetual  open  access  •  Addi;onal  formats   –  Fixity  of  bitstream  guaranteed  where  preserva;on   specifica;ons  cannot  be  developed  •  Allow  download  of  content  not  rendered  in  the   interface  •  Support  ar;cles  and  contextual  informa;on  (lists   of  editors,  submission  requirements)  •  Support  for  revisions  to  content  
  17. 17. Publishing  into  the   Repository  
  18. 18. Higher  Educa;on   Source  /  Editorial   Market   Archive  
  19. 19. Publishing  into  the  Repository  •  Openness   –  Con;nual  stewardship  and  access  •  Sustainability   –  Library  as  engine  of  communica;on  
  20. 20. How  to  find  out  more  •  About:  hVp://www.hathitrust.org/about  •  TwiVer:  hVp://twiVer.com/hathitrust  •  Facebook:  hVp://www.facebook.com/hathitrust  •  Monthly  newsleVer:     –  hVp:www.hathitrust.org/updates   –  RSS  hVp://www.hathitrust.org/updates_rss  •  Contact  us:  feedback@issues.hathitrust.org  •  Blogs:  hVp://www.hathitrust.org/blogs   –  Large-­‐scale  Search   –  Perspec;ves  from  HathiTrust  
  21. 21. Thank  you  very  much!  
  22. 22. File Format Considerations in the Preservation of e-Books Sheila Morrissey Senior Research Developer, Portico NISO Webinar: Heritage Lost? Ensuring the Preservation of E-books May 23, 1012
  23. 23. Portico - Third Party Preservation Portico is among the largest community- supported digital archives in the world. Working with libraries, publishers, and funders, we preserve e- journals, e-books, and other electronic scholarly content to ensure researchers and students will have access to it in the future.
  24. 24. Portico - Participating Content Over 2,000 societies, and associations have committed content to Portico through 147 publishers agreements. Committed Content »  E-journal titles 13,675 »  E-book titles 129,781 »  D-collections 46
  25. 25. Portico – Preserved Content Preserved Content »  E-journal titles 9,568 »  E-book titles 16,861 »  D-collections 12 »  Archival Units 19,433,869 »  Preserved Files 319,737,011
  26. 26. Portico - Audit and Certification In 2010, Portico became the first digital preservation service to be independently audited by the Center for Research Libraries (CRL) and subsequently certified as a trusted, reliable digital preservation solution that serves the needs of the library community.
  27. 27. Portico - History 2006 2009 2002 Portico PorticoLaunch of ingests ingestsElectronic initial e- initial e- 2009Archiving journal book CRL Initiative content content audit of by into the into the Portico JSTOR archive archive begins 2005 2007 2009 2010 Portico Portico Portico Portico Launched makes fulfills first ingests first PCA initial d- trigger claim collection title content available
  28. 28. Digital Preservation Digital preservation is the series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long-term. The key goals of digital preservation include: Usability Authenticity Discoverability Accessibility •  the intellectual •  the provenance of •  the content must •  the content must be content of the item the content must be have logical available for use to must remain usable proven and the bibliographic the appropriate via the delivery content an authentic metadata so that it community mechanism of replica of the can be found by end current technology original users through time
  29. 29. Preservation: Legal aspects Legal right to preserve content »  Not always the same as access rights »  Specified in contracts »  Includes embedded or supplemental files, such as images »  DRM removed
  30. 30. Usability - Preserve Intellectual Content
  31. 31. Usability - Preserve Intellectual Content
  32. 32. Usability: Rendition and Delivery Content is rendered to support current delivery platform, i.e. web browser. … rendered & delivered … Rendition engine can be modified to meet new technology requirements.
  33. 33. Portico – Another Look at the History 2009 2011 2006 iPad 2 Portico 2002 Portico ingests KindleLaunch of ingests initial e- FireElectronic initial e- book NookArchiving journal content Simple Initiative content Touch by into the Kindle 2 JSTOR archive Nook ePub3 2005 2007 2010 2012 Portico Portico iPad 1 Portico Launched makes Nook ingests first Color initial d- trigger collection title content available iPad 3 iPhone Kindle 1
  34. 34. Usability: Anticipated usage …
  35. 35. Usability: … and new usage
  36. 36. Authenticity, Discoverability:Preservation Context
  37. 37. Context
  38. 38. Context
  39. 39. Context
  40. 40. Context
  41. 41. Context
  42. 42. Context
  43. 43. ...
  44. 44. Formats: Packages
  45. 45. Formats: Packages
  46. 46. Formats: Packages
  47. 47. E-Book Packages in Portico Submissions Flat directory »  ONIX xml file with bibliographic metadata, one PDF file per book   Front Cover image JPG files
  48. 48. E-Book Packages in Portico Submissions TAR file (multiple books per file) »  XML manifest file »  One directory for each book,   Proprietary XML file (3 possible versions of XML) with bibliographic metadata,   Subdirectory with files for front matter “chapters” (XML. PDF, OCR of PDF)   Subdirectory with files for regular “chapters” (XML. PDF, OCR of PDF) front   Subdirectory with files for back matter “chapters” (XML. PDF, OCR of PDF)   Subdirectory with TIFF file for cover image of book
  49. 49. E-Book Packages in Portico Submissions ZIP file (sometimes one book per file, sometime multiple books) »  Sometimes flat (all books at one level) »  Sometimes one directory for each book,   Sometimes cover images (JPG or TIFF)   Sometimes one PDF for entire book in addition to PDF for each chapter »  Sometimes a manifest
  50. 50. Formats: Text Content Hello,  World!!  
  51. 51. Formats: Text Content BT /H2 <</MCID 0 >>BDC Hello,  World!!   /CS0 cs 0.31 0.506 0.741 scn /TT0 1 Tf -0.004 Tc 0.006 Tw 12.96 0 0 12.96 72 697.68 Tm [(H)-4(e)-1(l)-1(l)-11 (o,)-3( W)-15(or)-6 (l)-11(d!)-12(!)]TJ 0 Tc 0 Tw 6.481 0 Td ( )Tj EMC ET
  52. 52. Formats: Text Content <html> <head> Hello,  World!!   <style type="text/css"> <!-- p { color: #4F81BD; font-family: serif; font-weight: bold; font-size: 13pt; } --> </style> </head> <body><p>Hello, World!! </p></body> </html>
  53. 53. Trade-offs: Expressiveness vs. Simplicity Hello,  World!!  
  54. 54. Formats: Rich Content Hello,  World!!  
  55. 55. Formats: Rich Content BT Hello,  World!!   /H2 <</MCID 0 >>BDC /CS0 cs 0.31 0.506 0.741 scn /TT0 1 Tf -0.004 Tc 0.006 Tw 12.96 0 0 12.96 264 697.68 Tm [(H)-4(e)-1(l)-2(l)-11(o,)-3( W)-15(or)-6 (l)-11(d!)-12(!)]TJ 0 Tc 0 Tw 6.481 0 Td ( )Tj EMC /P <</MCID 1 >>BDC /CS1 cs 0 scn /TT1 1 Tf 11.04 0 0 11.04 72 682.08 Tm ( )Tj EMC /P <</MCID 2 >>BDC 36.478 -24.185 Td ( )Tj EMC ET /Figure <</MCID 3 >>BDC q /GS0 gs 336 0 0 252 139.1000061 414.6812744 cm /Im0 Do Q EMC
  56. 56. Formats: Rich Content Hello,  World!!   (iText RUPS)
  57. 57. Formats: Rich Content <html> <head> <style type="text/css"> Hello,  World!!   <!-- p { color: #4F81BD; font-family: serif; font-weight: bold; font- size: 13pt; }--> </style> </head> <body><p>Hello, World!! <br/><span><IMG width="447" height="336" src=“images/ Image_001.jpg"/></ span></p></body> </html>
  58. 58. Trade-offs: Encapsulation vs. Articulation mydir/ myFile.pdf mydir/ myFile.html images/ Image01.jpg
  59. 59. E-book formats in Portico Submissions PDF »  One file per chapter »  One file per book TIFF »  One file per page JPEG »  One file per page XML »  For bibliographic metadata »  Proprietary »  ONIX variants »  NLM variants
  60. 60. Looking ahead: EPUB 3 EPUB 3 (http://idpf.org/epub/30 ) »  “EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content-- including HTML5, CSS, SVG, images, and other resources-- for distribution in a single-file format.”
  61. 61. Looking ahead: EPUB 3 EPUB 3 »  Web standards for key component technologies »  Free and open specification »  Must work in at least some appliance   Outside publisher’s own workflow
  62. 62. EPUB3 Packaging
  63. 63. EPUB3 Formats “Profiles” of standard formats for authoring content »  XHTML5, SVG 1.1, CSS 2.1, CSS 3   Constraints (extensions to HTML5, constraints on SVG)   Specs a “moving target” Conforming readers must support rendition of certain formats »  Image, audio, video   Defined fallbacks Globalization, Encoding, Fonts
  64. 64. Complications: The New “Browser Wars” Amazon »  Announces it is replacing MOBI with K8 iBooks »  Different mimetype »  Proprietary extension of CSS Media Queries »  Proprietary XML namespace »  Etc.
  65. 65. Complications: "More What You’d Call ‘Guidelines’Than Actual Rules” Pirates of the Caribbean: The Black Pearl. The Walt Disney Company (2003)
  66. 66. Questions or Comments? Sheila Morrisseysheila.morrissey@ithaka.org @sheilaMorr www.portico.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×