Forward in Reverse<br />A Gentle Overview Of Forward System Architecture<br />Eric, Mike & Steve – WiLSWorld 2010<br />
Outline<br />Intro to Forward with Demo<br />Batch Processing (Backend)<br />Web Application (Frontend)<br />Challenges<br...
Intro & Demo<br />
http://forward.library.wisconsin.edu<br />
Batch Processing<br />
We have gobs & gobs of data.<br />
1) Extract it<br />
1a) ILS Data<br />
Sort, Deduplicate, Merge<br />
Antique Style Key<br />By Stars*Go*Blue<br />http://www.flickr.com/photos/artbydebora/1406682449/<br />
Common Identifier = OCLC Number<br />
Catalog Extract Processing Details<br /><ul><li>14 Voyager Instances
13M MARC bibliographic records extracted
Approximately 14 hours
Local C code</li></ul>Sorted, deduplicated and merged output: <br /><ul><li>8M records
10GB Raw MARC data</li></li></ul><li>Why Merge?<br /><ul><li>URLs
Formats
Holdings</li></li></ul><li>1b) Digital Collection Data<br />
Fedora Extract Processing Details<br /><ul><li>1 Fedora Repository
13K “First Class” XML Objects extracted
Approximately 4 hours
Repository query language</li></ul>XML output: <br /><ul><li>METS XML package
Structural XML
MODS Bibliographic XML
41MB XML data</li></li></ul><li>2) Index it<br />
We take raw library data and process it with MARC/XML parsing tools and local parsing rulesin order to build a Solr search...
Raw data (MARC & METS XML) <br />Parsing libraries (Java code: marc4j, SAXParser)<br />Local code that defines parsing rul...
1. Raw data<br />
LEADER 02000cam a22003734a 45 0<br />001 6939454<br />005 20051208125417.0<br />008 051104s2004    enka     $b    001 0 en...
02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020...
2. MARC/XML parsing libraries<br />
02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020...
3. Local code<br />
4.<br />http://lucene.apache.org/solr/<br />
What is Solr?<br />An XML API over a Lucene search index.<br />
Access to Raw Formats<br />Raw MARC stored for Merged record<br />Live calls made to Fedora<br />web services<br />
Data Refresh<br />Bibliographic: weekly<br />Circulation status: nightly<br />
For more information, see<br />http://sdg.library.wisc.edu/blog/2010/03/03/solr-marc-indexing-based-on-diffs/<br />
Web Application<br />
Frontend?<br />(X)HTML<br />JavaScript<br />Cascading Style Sheets<br />Design<br />Information Architecture<br />User exp...
Forward Colophon<br />ActiveRecordBaseWithoutTable (Rails plugin)<br />Apache<br />Blacklight (Rails plugin)<br />Blueprin...
Campus Affiliation<br />Users localize to a school, allows us scope many features to their campus.<br />GeoIPRubyGem<br />...
Raphaël<br />SVG elements, like the circles and squares in the Forward splash page, can be treated as XHTML elements allow...
Campus Homepage<br />Forward application stack:<br /><ul><li>Apache+Passenger (modrails)
Ruby on Rails
PostgreSQL
Apache Solr</li></li></ul><li>Apache+Passenger<br />Phusion Passenger is an Apache module, which makes deploying Ruby and ...
Ruby on Rails<br />“Ruby on Rails is an open-source web framework that’s optimized for programmer happiness and sustainabl...
PostgreSQL<br />“PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of act...
Apache Solr<br />“Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project....
Results<br />
Results – Three columns<br />
Results – Data sources<br />
Results – Facets – Solr<br />
Results – Solr + PostgreSQL + APIs<br />
Results – Context – APIs <br />
Results – Three main columns<br />
Upcoming SlideShare
Loading in …5
×

Forward in Reverse

1,549 views
1,424 views

Published on

Eric, Mike and Steve details the UW Forward system architecture from ingest to user interface.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,549
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Forward in Reverse

  1. 1. Forward in Reverse<br />A Gentle Overview Of Forward System Architecture<br />Eric, Mike & Steve – WiLSWorld 2010<br />
  2. 2. Outline<br />Intro to Forward with Demo<br />Batch Processing (Backend)<br />Web Application (Frontend)<br />Challenges<br />Q&As throughout<br />
  3. 3. Intro & Demo<br />
  4. 4. http://forward.library.wisconsin.edu<br />
  5. 5. Batch Processing<br />
  6. 6. We have gobs & gobs of data.<br />
  7. 7. 1) Extract it<br />
  8. 8. 1a) ILS Data<br />
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13. Sort, Deduplicate, Merge<br />
  14. 14. Antique Style Key<br />By Stars*Go*Blue<br />http://www.flickr.com/photos/artbydebora/1406682449/<br />
  15. 15. Common Identifier = OCLC Number<br />
  16. 16. Catalog Extract Processing Details<br /><ul><li>14 Voyager Instances
  17. 17. 13M MARC bibliographic records extracted
  18. 18. Approximately 14 hours
  19. 19. Local C code</li></ul>Sorted, deduplicated and merged output: <br /><ul><li>8M records
  20. 20. 10GB Raw MARC data</li></li></ul><li>Why Merge?<br /><ul><li>URLs
  21. 21. Formats
  22. 22. Holdings</li></li></ul><li>1b) Digital Collection Data<br />
  23. 23.
  24. 24. Fedora Extract Processing Details<br /><ul><li>1 Fedora Repository
  25. 25. 13K “First Class” XML Objects extracted
  26. 26. Approximately 4 hours
  27. 27. Repository query language</li></ul>XML output: <br /><ul><li>METS XML package
  28. 28. Structural XML
  29. 29. MODS Bibliographic XML
  30. 30. 41MB XML data</li></li></ul><li>2) Index it<br />
  31. 31. We take raw library data and process it with MARC/XML parsing tools and local parsing rulesin order to build a Solr search index.<br />
  32. 32. Raw data (MARC & METS XML) <br />Parsing libraries (Java code: marc4j, SAXParser)<br />Local code that defines parsing rules<br />Solr index<br />
  33. 33. 1. Raw data<br />
  34. 34. LEADER 02000cam a22003734a 45 0<br />001 6939454<br />005 20051208125417.0<br />008 051104s2004 enka $b 001 0 eng <br />010 $a 2003045349 <br />035 $a (OCoLC)ocm52165958 <br />040 $aDLC $c DLC $d XMA $d BAKER $d UKM <br />015 $a GBA430162 $2bnb<br />016 7$a 012906573 $2Uk<br />020 $a 0754605175 (alk. paper) <br />024 $a 99811375970 <br />042 $apcc<br />049 $a GZMA <br />050 00$a B3376.W564 $b W55355 2004 <br />082 00 $a 111/.85/092 $2 21 <br />245 00 $a Wittgenstein, aesthetics, and philosophy / $c edited by Peter B. Lewis. <br />260 $aAldershot, Hants, England ; $a Burlington, VT : $bAshgate, $c c2004. <br />300 $a xii, 255 p. : $b ill. ; $c 24 cm. <br />440 0 $aAshgateWittgensteinian studies <br />505 0$a Wittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. Hagberg<br />504 $a Includes bibliographical references (p. 235-247) and index.<br />
  35. 35. 02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020002800173024001600201042000800217049000900225050002800234082002000262245007400282260006800356300003400424440003600458505081100494504006401305600005001369700002501419938007101444945001901515946003001534946001301564947002101577948001601598994001201614693945420051208125417.0051104s2004 enkab 001 0 eng a 2003045349 a(OCoLC)ocm52165958 aDLCcDLCdXMAdBAKERdUKM aGBA4301622bnb7 a0129065732Uk a0754605175 (alk. paper) a99811375970 apcc aGZMA00aB3376.W564bW55355 200400a111/.85/09222100aWittgenstein, aesthetics, and philosophy /cedited by Peter B. Lewis. aAldershot, Hants, England ;aBurlington, VT :bAshgate,cc2004. axii, 255 p. :bill. ;c24 cm. 0aAshgate Wittgensteinian studies0 aWittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. HagbergaIncludes bibliographical references (p. 235-247) and index.10aWittgenstein, Ludwig,d1889-1951xAesthetics.1 aLewis, Peter,d1947- aBaker & TaylorbBKTYc99.95d99.95i0754605175n0004227086sactive c1d89087961587 a714694b2005-11-23c81.86 c99.95d1 aHEUR 4801bm,stk aSCNd348032 a92bGZM<br />
  36. 36.
  37. 37. 2. MARC/XML parsing libraries<br />
  38. 38.
  39. 39.
  40. 40. 02000cam a22003734a 45 001000800000005001700008008004100025010001700066035002300083040003000106015001900136016001800155020002800173024001600201042000800217049000900225050002800234082002000262245007400282260006800356300003400424440003600458505081100494504006401305600005001369700002501419938007101444945001901515946003001534946001301564947002101577948001601598994001201614693945420051208125417.0051104s2004 enkab 001 0 eng a 2003045349 a(OCoLC)ocm52165958 aDLCcDLCdXMAdBAKERdUKM aGBA4301622bnb7 a0129065732Uk a0754605175 (alk. paper) a99811375970 apcc aGZMA00aB3376.W564bW55355 200400a111/.85/09222100aWittgenstein, aesthetics, and philosophy /cedited by Peter B. Lewis. aAldershot, Hants, England ;aBurlington, VT :bAshgate,cc2004. axii, 255 p. :bill. ;c24 cm. 0aAshgate Wittgensteinian studies0 aWittgenstein and the aesthetic domain / Kjell S. Johannessen -- 2. Wittgenstein, anti-essentialism and the definition of art / Terry Diffey -- 3. Rules, creativity and pictures : Wittgenstein's Lectures on aesthetics / David Novitz -- 4. Criticism without theory / Mark W. Rove -- 5. On aesthetic reactions and changing one's mind / Lars Hertzberg -- 6. Wittgenstein and the arts : understanding and performing / Graham McFee -- 7. Wittgenstein's music / R.A. Sharpe -- 8. Wittgenstein on music and language / Oswald Hanfling -- 9. Ethics and aesthetics are one / Carolyn Wilde -- 10. Fiction and reality in the arts / IlhamDilman -- 11. Literature, human understanding and morality / Ben Tilghman -- 12. 'The self, thinking' : Wittgenstein, Augustine and the autobiographical situation / Garry L. HagbergaIncludes bibliographical references (p. 235-247) and index.10aWittgenstein, Ludwig,d1889-1951xAesthetics.1 aLewis, Peter,d1947- aBaker & TaylorbBKTYc99.95d99.95i0754605175n0004227086sactive c1d89087961587 a714694b2005-11-23c81.86 c99.95d1 aHEUR 4801bm,stk aSCNd348032 a92bGZM<br />
  41. 41.
  42. 42.
  43. 43.
  44. 44. 3. Local code<br />
  45. 45.
  46. 46.
  47. 47. 4.<br />http://lucene.apache.org/solr/<br />
  48. 48. What is Solr?<br />An XML API over a Lucene search index.<br />
  49. 49.
  50. 50.
  51. 51. Access to Raw Formats<br />Raw MARC stored for Merged record<br />Live calls made to Fedora<br />web services<br />
  52. 52. Data Refresh<br />Bibliographic: weekly<br />Circulation status: nightly<br />
  53. 53.
  54. 54. For more information, see<br />http://sdg.library.wisc.edu/blog/2010/03/03/solr-marc-indexing-based-on-diffs/<br />
  55. 55. Web Application<br />
  56. 56. Frontend?<br />(X)HTML<br />JavaScript<br />Cascading Style Sheets<br />Design<br />Information Architecture<br />User experience<br />Chrome (images, icons, pretty)<br />
  57. 57. Forward Colophon<br />ActiveRecordBaseWithoutTable (Rails plugin)<br />Apache<br />Blacklight (Rails plugin)<br />Blueprint CSS<br />Bookreader (jQuery)<br />Capistrano<br />Crontab<br />Engines (Rails plugin)<br />Fedora<br />Freebase API<br />GeoIP (Ruby gem)<br />Google Books API<br />Haml (Rails plugin)<br />Happymapper (Ruby gem)<br />HathiTrust API<br />jQuery<br />Ken (Ruby gem)<br />LowPro (Prototype JS)<br />MARC4J<br />Passenger (modrails)<br />Prototype JS<br />PostgreSQL<br />Raphael<br />Ruby on Rails<br />Shibboleth<br />Subversion<br />Solr / Lucene<br />Summon (Ruby gem)<br />UW-Madison Libraries Staff Directory API<br />UWDC (Rails plugin)<br />Voyager API<br />Tender love and attention<br />
  58. 58. Campus Affiliation<br />Users localize to a school, allows us scope many features to their campus.<br />GeoIPRubyGem<br />Match IP addresses with physical locations.<br />Raphaël—JavaScript Library <br />“Small JavaScript library that should simplify your work with vector graphics on the web”.<br />
  59. 59. Raphaël<br />SVG elements, like the circles and squares in the Forward splash page, can be treated as XHTML elements allowing us to manipulate them with JavaScript and CSS.<br />http://raphaeljs.com/<br />
  60. 60. Campus Homepage<br />Forward application stack:<br /><ul><li>Apache+Passenger (modrails)
  61. 61. Ruby on Rails
  62. 62. PostgreSQL
  63. 63. Apache Solr</li></li></ul><li>Apache+Passenger<br />Phusion Passenger is an Apache module, which makes deploying Ruby and Ruby on Rails applications on Apache a breeze.<br />http://www.modrails.com/<br />
  64. 64. Ruby on Rails<br />“Ruby on Rails is an open-source web framework that’s optimized for programmer happiness and sustainable productivity.”<br />http://rubyonrails.org/<br />
  65. 65. PostgreSQL<br />“PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.”<br />http://www.postgresql.org/<br />
  66. 66. Apache Solr<br />“Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.”<br />http://lucene.apache.org/solr/<br />
  67. 67. Results<br />
  68. 68. Results – Three columns<br />
  69. 69. Results – Data sources<br />
  70. 70. Results – Facets – Solr<br />
  71. 71. Results – Solr + PostgreSQL + APIs<br />
  72. 72. Results – Context – APIs <br />
  73. 73. Results – Three main columns<br />
  74. 74. Results – CSS grid<br />
  75. 75. Blueprint<br />“Blueprint is a CSS framework, which aims to cut down on your development time. It gives you a solid foundation to build your project on top of, with an easy-to-use grid, sensible typography, useful plugins, and even a stylesheet for printing.”<br />http://blueprintcss.org/<br />
  76. 76.
  77. 77. Show – Book<br />
  78. 78. Show – Image<br />
  79. 79. Show – Full Text Book<br />
  80. 80. Show – View Full Text Book<br />
  81. 81. BookReader<br />“The Internet Archive BookReader is used to view books from the Internet Archive online and can also be used to view other books. ”<br />http://github.com/openlibrary/bookreader<br />
  82. 82. Challenges<br />
  83. 83. Challenges<br />Merging MARC, METS extracts<br />Batch processing time (Time/CPU constraints)<br />Page level indexing (Bookviewer - memory/disk constraints)<br />Voyager API<br />Organization challenges<br />big project, small shop<br />dealing with vendor silos<br />multiple cataloging standards<br />quality of services challenges<br />
  84. 84. Thanks!<br />

×