Your SlideShare is downloading. ×
Solr Powered Libraries
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Solr Powered Libraries

1,089
views

Published on

Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most …

Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.

Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.

This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.

Published in: Technology, Education

1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,089
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
1
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © Copyright 2013 LucidWorksSolr Powered Libraries:A survey of the worlds knowledge basesMay 2, 2013Presented by Erik HatcherThursday, May 2, 13
  • 2. © 2013 LucidWorksAbstractUsing Apache Lucene and Solr search technologies, information andknowledge have become vastly more searchable, findable, and accessible.Because scholars and researchers are some of the most demanding users ofsearch systems, the problems encountered by the implementers are complex.For example, many of the applications built on these technologies also thrive onintentionally designed-in serendipitous discovery capabilities, bringing to lightpreviously unknown, yet related and potentially interesting, content.Libraries and other public knowledge-sharing environments, such asWikipedia, generally embrace "open source" and community improvingcontributions as core principles, making a lovely synergy with the power,features, and community-driven ecosystem provided by Lucene and Solr.This talk will introduce you to several Solr powered library-related systems,detail how they work, and leave you with lessons learned that can be applied toyour applications.2Thursday, May 2, 13
  • 3. © 2013 LucidWorksReal Solar Powered Library !•http://www.ktsm.com/news/texas-library-runs-sunshine3Thursday, May 2, 13
  • 4. © 2013 LucidWorksCard carrying library geek•Applied Research in Patacriticism (ARP)- Rossetti Archive: http://www.rossettiarchive.org- NINES: http://www.nines.org/- Collex: http://www.collex.org•Blacklight- originated as an implementation of Solr Flare•Presentations- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013- Library of Congress: "Solr Powered Libraries" (2007)»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113- EBTI/CBETA Conference 2008- Publication: “Library 2.0 Initiatives in Academic Libraries”•Windsor Lucene Summit•eIFL-FOSS4Thursday, May 2, 13
  • 5. © 2013 LucidWorksRossetti Archive5Thursday, May 2, 13
  • 6. © 2013 LucidWorksNINES/Collex6Thursday, May 2, 13
  • 7. © 2013 LucidWorksCard catalog•the original inverted index7http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpgThursday, May 2, 13
  • 8. © 2013 LucidWorks•http://openlibrary.org/- project of the Internet Archive•Goal: "A (community editable) web page for every book"8Thursday, May 2, 13
  • 9. © 2013 LucidWorksdp.la - Digital Public Library of America9Lucene/ElasticSearch PoweredThursday, May 2, 13
  • 10. © 2013 LucidWorksWikimedia/Wikipedia/MediaWiki•Solr powered: translation memory service, GeoData extension,etc•"heavily modified Lucene" powers main site search currently10Thursday, May 2, 13
  • 11. © 2013 LucidWorksHathiTrust• "partnership of major research institutions and libraries working to ensurethat the cultural record is preserved and accessible long into the future."• 10.5M books, 12TB OCR+metadata, hundreds of languages- "Books are different"- http://code4lib.org/conference/2013/burton-west• http://www.hathitrust.org/blogs/large-scale-search- http://www.hathitrust.org/blogs/large-scale-search/too-many-words- "org.apache.solr.common.SolrException: Impossible Exception"- CommonGrams- word segmentation: autoGeneratePhraseQueries="false"• HathiTrust Research Center- The infrastructure includes an entrance portal, search and collection-building tools (usingBlacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus(more than 3 million volumes). In addition to the production services, the HTRC offers adevelopment “sandbox”. The sandbox runs against non-Google scanned content (about260,000 volumes) and provides a test-bed for interested researchers to experiment with writingtheir own algorithms for use in the HTRC infrastructure.11Thursday, May 2, 13
  • 12. © 2013 LucidWorksSmithsonian Institution•http://collections.si.edu•Many disparate data sources:- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 AstrophysicalObservatory, research centers in Panama,Boston, New York, Maryland,andVirginia•"Documents" of all varieties:- Photographs, paintings, manuscripts, letters, postage stamps,scientificspecimens, rockets, airplanes, postcards, sound recordings, posters,decorative arts, ceramics, maps, sculptures, publication papers, books, tradecatalogs, etc•User tagging, negative/exclude filtering, DIH SolrEntityProcessor•http://bit.ly/13P41YJ- http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-wang-steps-toward-open-government.pdf12Thursday, May 2, 13
  • 13. © 2013 LucidWorks13Thursday, May 2, 13
  • 14. © 2013 LucidWorks14Thursday, May 2, 13
  • 15. © 2013 LucidWorks•SerialsSolutions Summon•http://www.serialssolutions.com/en/services/summon•SaaS, single unified index, match & merge15Thursday, May 2, 13
  • 16. © 2013 LucidWorksAstrophysics Data System Labs•Smithsonian, NASA, Harvard•http://adslabs.org16http://code4lib.org/conference/2013/lukerThursday, May 2, 13
  • 17. © 2013 LucidWorks•vufind.org•Powers main HathiTrust UI (currently) and many more- see http://vufind.org/wiki/installation_status17Thursday, May 2, 13
  • 18. © 2013 LucidWorks18Thursday, May 2, 13
  • 19. © 2013 LucidWorks• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface forany Solr index. Blacklight provides a default user interface which is customizable via thestandard Rails (templating) mechanisms. Blacklight accommodates heterogeneousdata, allowing different information displays for different types of objects."- http://projectblacklight.org• Founded at the University of Virginia (2007): search.lib.virginia.edu- UV-A solar radiation == blacklight• Initial contributors: UVa, Stanford, JHU, WGBH• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-Madison, Tufts, Australian govt (Natural Resource Management), Penn StatesScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,Agriculture Network Information Center (USDA), alicelaw.org (American Legislative andIssue Campaign Exchange, is a one-stop web-based public library of progressive stateand local laws), and many more• http://projecthydra.org/ uses Blacklight as UI component19Thursday, May 2, 13
  • 20. © 2013 LucidWorkssearchworks at Stanford20Thursday, May 2, 13
  • 21. © 2013 LucidWorksAdvanced search at Stanfords searchworks21Thursday, May 2, 13
  • 22. © 2013 LucidWorkssearchworks:Mapping Text Boxes to Solr query pieces•http://code4lib.org/conference/2010/dushay_keck22Thursday, May 2, 13
  • 23. © 2013 LucidWorks•https://catalyst.library.jhu.edu/23Thursday, May 2, 13
  • 24. © 2013 LucidWorksRock and Roll!•m/24Thursday, May 2, 13
  • 25. © 2013 LucidWorksCommunity and Resources•code4lib:- http://www.code4lib.org/•HathiTrust folks- http://www.hathitrust.org/blogs/large-scale-search- http://robotlibrarian.billdueber.com/•http://bighumanities.net/- The Workshop on Big Humanities will be held in conjunction with the 2013IEEE International Conference on Big Data (IEEE BigData 2013), which willtake place between 6-9 October 2013 in Silicon Valley, California, USA, andwhich provides a leading international forum for disseminating the latestresearch in the growing field of “big data25Thursday, May 2, 13
  • 26. © 2013 LucidWorks26http://heatherbrewer.com/blog/2013/04/15/libraries-rock/Thursday, May 2, 13