CTS at LC - Access 2010

1,111 views
1,034 views

Published on

CTS at LC, talk given at Access 2010 in Winnipeg.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,111
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CTS at LC - Access 2010

  1. 1. CTS * at LC ** Daniel Chudnov - 2010-10-15 - dchud at loc gov Access 2010 - Winnipeg * Content Transfer Services follow along at ** Library of Congress slideshare.net / dchud
  2. 2. slideshare.net / dchud
  3. 3. work in progress
  4. 4. transfer verification inventory reporting workflow notification status access
  5. 5. hard to show but that won’t stop me
  6. 6. tinyurl.com/cts2010
  7. 7. when i’m done this should make sense
  8. 8. NDNP
  9. 9. publishing breaking news* online * 100 years after it happens
  10. 10. chroniclingamerica .loc.gov
  11. 11. 1,442,264 pages last year at Access
  12. 12. 2,692,369 pages this year at Access
  13. 13. went live spring 2007
  14. 14. first two years 1.4M last year 2.7M
  15. 15. 56 TB content 117 TB in copies
  16. 16. how?
  17. 17. 1. built a better access system
  18. 18. faster ingest from 1 month to 1 day
  19. 19. 2. workflow
  20. 20. CTS does this
  21. 21. months batches page counts
  22. 22. press event the gap first batch received 2005-10 went live spring 2007
  23. 23. 2010-09 1-2 month lag 2-3 month lag 3-4 month lag 2009-09 first CTS workflow
  24. 24. ingest rate approaches receipt rate
  25. 25. this makes us smile
  26. 26. content transfer services
  27. 27. some requirements
  28. 28. LC started digitizing in the 1980s
  29. 29. we have a lot of stuff
  30. 30. in a lot of places
  31. 31. distributed computing environment
  32. 32. commercial MFT * license * Managed File Transfer
  33. 33. buy or build? why, both, thank you
  34. 34. 100s of collections
  35. 35. dozens of curatorial organizations
  36. 36. lots more stuff coming every day
  37. 37. a long term project
  38. 38. collecting and making available
  39. 39. services for transfer of content
  40. 40. any content
  41. 41. lots of transfers “movage”
  42. 42. several services
  43. 43. transfer verification inventory reporting workflow notification status access
  44. 44. transfer across systems organizations time
  45. 45. content transfer is risky
  46. 46. copies fail
  47. 47. bits go bad
  48. 48. drives get lost
  49. 49. you forget what you did
  50. 50. you forget what you had
  51. 51. people retire
  52. 52. software breaks
  53. 53. hardware breaks
  54. 54. three blizzards in DC
  55. 55. CTS helps make transfers reliable and resilient
  56. 56. reliable know when you’ve succeeded
  57. 57. BagIt packing slip for data
  58. 58. . |-- bag-info.txt |-- bagit.txt |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | | | |-- 00206538016 | | |-- 0000.jp2 |-- 0000.pdf data in a Bag | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  59. 59. . |-- |-- bag-info.txt bagit.txt identifies a bag |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  60. 60. . where the |-- bag-info.txt |-- bagit.txt |-- data | | | |-- batch.xml |-- batch_1.xml data starts |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | |-- 0001.jp2 | | |-- 0001.pdf | | |-- 0001.tif | | |-- 0001.xml
  61. 61. . |-- bag-info.txt |-- bagit.txt |-- data | |-- batch.xml | |-- batch_1.xml | |-- batch_ne_dewitt_rework | | |-- 00206538016_batch.xml | | |-- 00206538028_batch.xml | | `-- sn99021999 | `-- sn99021999 | |-- 00206538016 | | |-- 0000.jp2 | | |-- 0000.pdf | | |-- 0000.tif | | |-- 0000.xml | | | | | | |-- 0001.jp2 |-- 0001.pdf | ... packing |-- `-- manifest-md5.txt tagmanifest-md5.txt slip
  62. 62. 71607ad119be88c842268a76f0b6b9e9 data/sn99021999/00206538107/1884091301/0621.pdf c602d2ac07508059ce5f5597e239b97f data/sn99021999/00206538120/1885100601/0831.xml a59795bd1584532d5cbc0b1d82f75cf8 data/sn99021999/00206538016/1880061401/0593.pdf 3c64fac7e2d49671e0d93908ae42a779 data/sn99021999/00206539616/1888101801/0905.xml 03158a560baa7479b3805d2b45ee02cd data/sn99021999/00206538028/1880111501/0405.tif fa56ea18580e1446939ed62709e5b2db data/sn99021999/00206538077/1883061901/1145.pdf bf4fb83ff8305e8256970a3466c1a12d data/sn99021999/00206538120/1885061501/0043.pdf 8f3649fc812de74b9d9443ee90a8ac9c data/sn99021999/00206538120/1885111101/1109.tif e0b83a7f9ca228271fdaecf6348e1cec data/sn99021999/00206538120/1885101201/0871.xml 1c2f84e12792c123ba0aabedd0c0bbad data/sn99021999/00206538107/1884071401/0197.xml 080e557fe9f68037605e5b80df4bc4ac data/sn99021999/0020653820A/1888050701/0543.tif 532efe32c156459d9d9589caf618f502 data/sn99021999/00206538120/1885071401/0250.tif ce607af59a96f2656d9448f38ffda072 data/sn99021999/0020653820A/1888052801/0731.pdf 60b626d8fd40aca1b425e86a004bb055 data/sn99021999/00206539628/1888111801/0088.xml a467cd62350334c7aa83cf1e9056c1c6 data/sn99021999/00206539616/1888091701/0629.jp2 1a434f7a4d843a2c8ffe8d0824fafc3f data/sn99021999/00206538028/1880120801/0482.jp2 22996d89b4a3334256afaddcaa0238d8 data/sn99021999/00206538016/1874102001/0259.jp2 36f550da273ad4c592fee1761c98322a data/sn99021999/00206538016/1880052201/0518.jp2 7f7ccec3f2afae896338498372fd476e data/sn99021999/00206539616/1888080101/0200.pdf c247a5d74d0e7f857c534d935661adbe data/sn99021999/00206538107/1884072601/0286.jp2 4d497a18a154adcc8636239378ab340b data/sn99021999/00206539628/1889021101/0868.pdf 2e8ca2558b54b5c49b2f20a355a60895 data/sn99021999/00206538065/1882092001/0136.xml fb71493048e5010100f18012f5060d42 data/sn99021999/00206538028/1880123001/0569.xml 40b100432890b055a5defbfbea815d57 data/sn99021999/00206538107/1884090901/0590.xml 46f6d61480dadc1c988b0baa4de8b6c4 data/sn99021999/00206539628/1888122801/0463.pdf 1cb8af0648e8c9df395b63226fe7371f data/sn99021999/00206538016/1874101501/0244.pdf 9257834023c683b02f354888b2740b8f data/sn99021999/00206539616/1888102301/0956.xml 0d52b3b2b1c5459b7e8d500a8566b0bf data/sn99021999/00206538120/1885080801/0425.tif
  63. 63. indicates two things
  64. 64. 1 what i think i’m sending you
  65. 65. 2 whether you received it
  66. 66. just like a packing slip
  67. 67. works across space
  68. 68. works across systems
  69. 69. works across orgs
  70. 70. works across time
  71. 71. easy to make
  72. 72. md5deep
  73. 73. BIL BagIt Library
  74. 74. Bagger desktop GUI
  75. 75. BIL is free software Bagger will be soon
  76. 76. sf.net/projects/loc-xferutils/
  77. 77. see also: BagIt in Wikipedia edsu++
  78. 78. reliability through bagging
  79. 79. resilience through persistence
  80. 80. verify that copies succeed
  81. 81. know when copies fail
  82. 82. repeat until copies succeed
  83. 83. debug & diagnose
  84. 84. record all of it
  85. 85. know what you have know what you did
  86. 86. inventory
  87. 87. BagIt checksums in a DB
  88. 88. content properties project, process, type
  89. 89. event timeline
  90. 90. receipt verification QR copies accept/reject ingest/release comments
  91. 91. life cycle of some set of content
  92. 92. basic facts project all the copies details
  93. 93. event timeline
  94. 94. comments along the way
  95. 95. life cycle of NDNP batch
  96. 96. two key things
  97. 97. 1 automated workflow using jBPM
  98. 98. this part
  99. 99. process definition manages the steps doesn’t let us forget
  100. 100. 2 when content partners call we can answer their questions
  101. 101. reporting answering our own questions
  102. 102. annual reports very important
  103. 103. file counts overall size etc.
  104. 104. used to be very difficult to determine
  105. 105. now immediate anytime
  106. 106. mostly NDNP newer partners
  107. 107. also project reporting / planning
  108. 108. NDNP batches - one awardee
  109. 109. NDNP batches - all awardees (same data, CSV export)
  110. 110. provides 5000’ view
  111. 111. workflow
  112. 112. working status at a glance
  113. 113. a personalized view
  114. 114. overview of a whole project
  115. 115. overview of a system overview of a person
  116. 116. not exactly “Facebook for bags” but kinda
  117. 117. but wait, there’s more
  118. 118. browse live copies
  119. 119. go right to the content
  120. 120. many benefits
  121. 121. aaaand... a RESTy web API
  122. 122. we can build complex workflows with inventory and reporting in CTS
  123. 123. we can build QR/workflow/auditing outside of CTS with inventory and reporting through CTS
  124. 124. CTS: java, spring, mysql hibernate, velocity, tiles jquery, jBPM, jetty
  125. 125. NDNP: python, django, mysql, solr, apache
  126. 126. nice clean interfaces nice separation
  127. 127. different coders, different styles
  128. 128. same benefits from using CTS
  129. 129. what’s next?
  130. 130. many more content collections
  131. 131. now: NDNP Web Archives NDIIPP Copyright Cards
  132. 132. next: P&P G&M WDL AFC Twitter Copyright EDeposit
  133. 133. also coming: more simple workflows
  134. 134. “Receive and Copy”
  135. 135. fits many use cases receive bag/verify copy to archival copy to access
  136. 136. works for recon works for new stuff
  137. 137. and, get past typical problems permissions insufficient storage failed copies
  138. 138. connection with high expectation
  139. 139. and, finally a UI redesign
  140. 140. thanks!
  141. 141. BagIt - wikipedia sf.net/projects/loc-xferutils/ hooray for protovis @dchud - dchud at loc gov

×