Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Forward in Reverse

1,760 views

Published on

Eric, Mike and Steve details the UW Forward system architecture from ingest to user interface.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Forward in Reverse

  1. 1. Forward in Reverse<br />A Gentle Overview Of Forward System Architecture<br />Eric, Mike & Steve – WiLSWorld 2010<br />
  2. 2. Outline<br />Intro to Forward with Demo<br />Batch Processing (Backend)<br />Web Application (Frontend)<br />Challenges<br />Q&As throughout<br />
  3. 3. Intro & Demo<br />
  4. 4. http://forward.library.wisconsin.edu<br />
  5. 5. Batch Processing<br />
  6. 6. We have gobs & gobs of data.<br />
  7. 7. 1) Extract it<br />
  8. 8. 1a) ILS Data<br />
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13. Sort, Deduplicate, Merge<br />
  14. 14. Antique Style Key<br />By Stars*Go*Blue<br />http://www.flickr.com/photos/artbydebora/1406682449/<br />
  15. 15. Common Identifier = OCLC Number<br />
  16. 16. Catalog Extract Processing Details<br /><ul><li>14 Voyager Instances
  17. 17. 13M MARC bibliographic records extracted
  18. 18. Approximately 14 hours
  19. 19. Local C code</li></ul>Sorted, deduplicated and merged output: <br /><ul><li>8M records
  20. 20. 10GB Raw MARC data</li></li></ul><li>Why Merge?<br /><ul><li>URLs
  21. 21. Formats
  22. 22. Holdings</li></li></ul><li>1b) Digital Collection Data<br />
  23. 23.
  24. 24. Fedora Extract Processing Details<br /><ul><li>1 Fedora Repository
  25. 25. 13K “First Class” XML Objects extracted
  26. 26. Approximately 4 hours
  27. 27. Repository query language</li></ul>XML output: <br /><ul><li>METS XML package
  28. 28. Structural XML
  29. 29. MODS Bibliographic XML
  30. 30. 41MB XML data</li></li></ul><li>2) Index it<br />
  31. 31. We take raw library data and process it with MARC/XML parsing tools and local parsing rulesin order to build a Solr search index.<br />
  32. 32. Raw data (MARC & METS XML) <br />Parsing libraries (Java code: marc4j, SAXParser)<br />Local code that defines parsing rules<br />Solr index<br />
  33. 33. 1. Raw data<br />
  34. 34. LEADER 02000cam a22003734a 45 0<br />001 6939454<br />005 20051208125417.0<br />008 051104s2004 enka $b 001 0 eng <br />010 $a 2003045349 <br />035 $a (OCoLC)ocm52165958 <br />040 $aDLC $c DLC $d XMA $d BAKER $d UKM <br />015 $a GBA430162 $2bnb<br />016 7$a 012906573 $2Uk<br />020 $a 0754605175 (alk. paper) <br />024 $a 99811375970 <br />042 $apcc<br />049 $a GZMA <br />050 00$a B3376.W564 $b W55355 2004 <br />082 00 $a 111/.85/092 $2 21 <br />245 00 $a Wittgenstein, aesthetics, and philosophy / $c edited by Peter B. Lewis. <br />260 $aAldershot, Hants, England ; $a Burlington, VT : $bAshgate, $c c2004. <br />300 $a xii, 255 p. : $b ill. ; $c 24 cm. <br />440 0 $aAshgateWittgensteinian studies <br />505 0$a Wittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. Hagberg<br />504 $a Includes bibliographical references (p. 235-247) and index.<br />
  35. 35. 02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020002800173024001600201042000800217049000900225050002800234082002000262245007400282260006800356300003400424440003600458505081100494504006401305600005001369700002501419938007101444945001901515946003001534946001301564947002101577948001601598994001201614693945420051208125417.0051104s2004 enkab 001 0 eng a 2003045349 a(OCoLC)ocm52165958 aDLCcDLCdXMAdBAKERdUKM aGBA4301622bnb7 a0129065732Uk a0754605175 (alk. paper) a99811375970 apcc aGZMA00aB3376.W564bW55355 200400a111/.85/09222100aWittgenstein, aesthetics, and philosophy /cedited by Peter B. Lewis. aAldershot, Hants, England ;aBurlington, VT :bAshgate,cc2004. axii, 255 p. :bill. ;c24 cm. 0aAshgate Wittgensteinian studies0 aWittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. HagbergaIncludes bibliographical references (p. 235-247) and index.10aWittgenstein, Ludwig,d1889-1951xAesthetics.1 aLewis, Peter,d1947- aBaker & TaylorbBKTYc99.95d99.95i0754605175n0004227086sactive c1d89087961587 a714694b2005-11-23c81.86 c99.95d1 aHEUR 4801bm,stk aSCNd348032 a92bGZM<br />
  36. 36.
  37. 37. 2. MARC/XML parsing libraries<br />
  38. 38.
  39. 39.
  40. 40. 02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020002800173024001600201042000800217049000900225050002800234082002000262245007400282260006800356300003400424440003600458505081100494504006401305600005001369700002501419938007101444945001901515946003001534946001301564947002101577948001601598994001201614693945420051208125417.0051104s2004 enkab 001 0 eng a 2003045349 a(OCoLC)ocm52165958 aDLCcDLCdXMAdBAKERdUKM aGBA4301622bnb7 a0129065732Uk a0754605175 (alk. paper) a99811375970 apcc aGZMA00aB3376.W564bW55355 200400a111/.85/09222100aWittgenstein, aesthetics, and philosophy /cedited by Peter B. Lewis. aAldershot, Hants, England ;aBurlington, VT :bAshgate,cc2004. axii, 255 p. :bill. ;c24 cm. 0aAshgate Wittgensteinian studies0 aWittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. HagbergaIncludes bibliographical references (p. 235-247) and index.10aWittgenstein, Ludwig,d1889-1951xAesthetics.1 aLewis, Peter,d1947- aBaker & TaylorbBKTYc99.95d99.95i0754605175n0004227086sactive c1d89087961587 a714694b2005-11-23c81.86 c99.95d1 aHEUR 4801bm,stk aSCNd348032 a92bGZM<br />
  41. 41.
  42. 42.
  43. 43.
  44. 44. 3. Local code<br />
  45. 45.
  46. 46.
  47. 47. 4.<br />http://lucene.apache.org/solr/<br />
  48. 48. What is Solr?<br />An XML API over a Lucene search index.<br />
  49. 49.
  50. 50.
  51. 51. Access to Raw Formats<br />Raw MARC stored for Merged record<br />Live calls made to Fedora<br />web services<br />
  52. 52. Data Refresh<br />Bibliographic: weekly<br />Circulation status: nightly<br />
  53. 53.
  54. 54. For more information, see<br />http://sdg.library.wisc.edu/blog/2010/03/03/solr-marc-indexing-based-on-diffs/<br />
  55. 55. Web Application<br />
  56. 56. Frontend?<br />(X)HTML<br />JavaScript<br />Cascading Style Sheets<br />Design<br />Information Architecture<br />User experience<br />Chrome (images, icons, pretty)<br />
  57. 57. Forward Colophon<br />ActiveRecordBaseWithoutTable (Rails plugin)<br />Apache<br />Blacklight (Rails plugin)<br />Blueprint CSS<br />Bookreader (jQuery)<br />Capistrano<br />Crontab<br />Engines (Rails plugin)<br />Fedora<br />Freebase API<br />GeoIP (Ruby gem)<br />Google Books API<br />Haml (Rails plugin)<br />Happymapper (Ruby gem)<br />HathiTrust API<br />jQuery<br />Ken (Ruby gem)<br />LowPro (Prototype JS)<br />MARC4J<br />Passenger (modrails)<br />Prototype JS<br />PostgreSQL<br />Raphael<br />Ruby on Rails<br />Shibboleth<br />Subversion<br />Solr / Lucene<br />Summon (Ruby gem)<br />UW-Madison Libraries Staff Directory API<br />UWDC (Rails plugin)<br />Voyager API<br />Tender love and attention<br />
  58. 58. Campus Affiliation<br />Users localize to a school, allows us scope many features to their campus.<br />GeoIPRubyGem<br />Match IP addresses with physical locations.<br />Raphaël—JavaScript Library <br />“Small JavaScript library that should simplify your work with vector graphics on the web”.<br />
  59. 59. Raphaël<br />SVG elements, like the circles and squares in the Forward splash page, can be treated as XHTML elements allowing us to manipulate them with JavaScript and CSS.<br />http://raphaeljs.com/<br />
  60. 60. Campus Homepage<br />Forward application stack:<br /><ul><li>Apache+Passenger (modrails)
  61. 61. Ruby on Rails
  62. 62. PostgreSQL
  63. 63. Apache Solr</li></li></ul><li>Apache+Passenger<br />Phusion Passenger is an Apache module, which makes deploying Ruby and Ruby on Rails applications on Apache a breeze.<br />http://www.modrails.com/<br />
  64. 64. Ruby on Rails<br />“Ruby on Rails is an open-source web framework that’s optimized for programmer happiness and sustainable productivity.”<br />http://rubyonrails.org/<br />
  65. 65. PostgreSQL<br />“PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.”<br />http://www.postgresql.org/<br />
  66. 66. Apache Solr<br />“Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.”<br />http://lucene.apache.org/solr/<br />
  67. 67. Results<br />
  68. 68. Results – Three columns<br />
  69. 69. Results – Data sources<br />
  70. 70. Results – Facets – Solr<br />
  71. 71. Results – Solr + PostgreSQL + APIs<br />
  72. 72. Results – Context – APIs <br />
  73. 73. Results – Three main columns<br />
  74. 74. Results – CSS grid<br />
  75. 75. Blueprint<br />“Blueprint is a CSS framework, which aims to cut down on your development time. It gives you a solid foundation to build your project on top of, with an easy-to-use grid, sensible typography, useful plugins, and even a stylesheet for printing.”<br />http://blueprintcss.org/<br />
  76. 76.
  77. 77. Show – Book<br />
  78. 78. Show – Image<br />
  79. 79. Show – Full Text Book<br />
  80. 80. Show – View Full Text Book<br />
  81. 81. BookReader<br />“The Internet Archive BookReader is used to view books from the Internet Archive online and can also be used to view other books. ”<br />http://github.com/openlibrary/bookreader<br />
  82. 82. Challenges<br />
  83. 83. Challenges<br />Merging MARC, METS extracts<br />Batch processing time (Time/CPU constraints)<br />Page level indexing (Bookviewer - memory/disk constraints)<br />Voyager API<br />Organization challenges<br />big project, small shop<br />dealing with vendor silos<br />multiple cataloging standards<br />quality of services challenges<br />
  84. 84. Thanks!<br />

×