Solr Powered Libraries
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Solr Powered Libraries

  • 1,532 views
Uploaded on

Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the......

Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.

Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.

This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,532
On Slideshare
1,341
From Embeds
191
Number of Embeds
2

Actions

Shares
Downloads
10
Comments
1
Likes
4

Embeds 191

http://www.scoop.it 161
https://twitter.com 30

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © Copyright 2013 LucidWorksSolr Powered Libraries:A survey of the worlds knowledge basesMay 2, 2013Presented by Erik HatcherThursday, May 2, 13
  • 2. © 2013 LucidWorksAbstractUsing Apache Lucene and Solr search technologies, information andknowledge have become vastly more searchable, findable, and accessible.Because scholars and researchers are some of the most demanding users ofsearch systems, the problems encountered by the implementers are complex.For example, many of the applications built on these technologies also thrive onintentionally designed-in serendipitous discovery capabilities, bringing to lightpreviously unknown, yet related and potentially interesting, content.Libraries and other public knowledge-sharing environments, such asWikipedia, generally embrace "open source" and community improvingcontributions as core principles, making a lovely synergy with the power,features, and community-driven ecosystem provided by Lucene and Solr.This talk will introduce you to several Solr powered library-related systems,detail how they work, and leave you with lessons learned that can be applied toyour applications.2Thursday, May 2, 13
  • 3. © 2013 LucidWorksReal Solar Powered Library !•http://www.ktsm.com/news/texas-library-runs-sunshine3Thursday, May 2, 13
  • 4. © 2013 LucidWorksCard carrying library geek•Applied Research in Patacriticism (ARP)- Rossetti Archive: http://www.rossettiarchive.org- NINES: http://www.nines.org/- Collex: http://www.collex.org•Blacklight- originated as an implementation of Solr Flare•Presentations- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013- Library of Congress: "Solr Powered Libraries" (2007)»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113- EBTI/CBETA Conference 2008- Publication: “Library 2.0 Initiatives in Academic Libraries”•Windsor Lucene Summit•eIFL-FOSS4Thursday, May 2, 13
  • 5. © 2013 LucidWorksRossetti Archive5Thursday, May 2, 13
  • 6. © 2013 LucidWorksNINES/Collex6Thursday, May 2, 13
  • 7. © 2013 LucidWorksCard catalog•the original inverted index7http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpgThursday, May 2, 13
  • 8. © 2013 LucidWorks•http://openlibrary.org/- project of the Internet Archive•Goal: "A (community editable) web page for every book"8Thursday, May 2, 13
  • 9. © 2013 LucidWorksdp.la - Digital Public Library of America9Lucene/ElasticSearch PoweredThursday, May 2, 13
  • 10. © 2013 LucidWorksWikimedia/Wikipedia/MediaWiki•Solr powered: translation memory service, GeoData extension,etc•"heavily modified Lucene" powers main site search currently10Thursday, May 2, 13
  • 11. © 2013 LucidWorksHathiTrust• "partnership of major research institutions and libraries working to ensurethat the cultural record is preserved and accessible long into the future."• 10.5M books, 12TB OCR+metadata, hundreds of languages- "Books are different"- http://code4lib.org/conference/2013/burton-west• http://www.hathitrust.org/blogs/large-scale-search- http://www.hathitrust.org/blogs/large-scale-search/too-many-words- "org.apache.solr.common.SolrException: Impossible Exception"- CommonGrams- word segmentation: autoGeneratePhraseQueries="false"• HathiTrust Research Center- The infrastructure includes an entrance portal, search and collection-building tools (usingBlacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus(more than 3 million volumes). In addition to the production services, the HTRC offers adevelopment “sandbox”. The sandbox runs against non-Google scanned content (about260,000 volumes) and provides a test-bed for interested researchers to experiment with writingtheir own algorithms for use in the HTRC infrastructure.11Thursday, May 2, 13
  • 12. © 2013 LucidWorksSmithsonian Institution•http://collections.si.edu•Many disparate data sources:- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 AstrophysicalObservatory, research centers in Panama,Boston, New York, Maryland,andVirginia•"Documents" of all varieties:- Photographs, paintings, manuscripts, letters, postage stamps,scientificspecimens, rockets, airplanes, postcards, sound recordings, posters,decorative arts, ceramics, maps, sculptures, publication papers, books, tradecatalogs, etc•User tagging, negative/exclude filtering, DIH SolrEntityProcessor•http://bit.ly/13P41YJ- http://www.basistech.com/pdf/events/open-source-search-conference/oss-2011-wang-steps-toward-open-government.pdf12Thursday, May 2, 13
  • 13. © 2013 LucidWorks13Thursday, May 2, 13
  • 14. © 2013 LucidWorks14Thursday, May 2, 13
  • 15. © 2013 LucidWorks•SerialsSolutions Summon•http://www.serialssolutions.com/en/services/summon•SaaS, single unified index, match & merge15Thursday, May 2, 13
  • 16. © 2013 LucidWorksAstrophysics Data System Labs•Smithsonian, NASA, Harvard•http://adslabs.org16http://code4lib.org/conference/2013/lukerThursday, May 2, 13
  • 17. © 2013 LucidWorks•vufind.org•Powers main HathiTrust UI (currently) and many more- see http://vufind.org/wiki/installation_status17Thursday, May 2, 13
  • 18. © 2013 LucidWorks18Thursday, May 2, 13
  • 19. © 2013 LucidWorks• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface forany Solr index. Blacklight provides a default user interface which is customizable via thestandard Rails (templating) mechanisms. Blacklight accommodates heterogeneousdata, allowing different information displays for different types of objects."- http://projectblacklight.org• Founded at the University of Virginia (2007): search.lib.virginia.edu- UV-A solar radiation == blacklight• Initial contributors: UVa, Stanford, JHU, WGBH• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-Madison, Tufts, Australian govt (Natural Resource Management), Penn StatesScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,Agriculture Network Information Center (USDA), alicelaw.org (American Legislative andIssue Campaign Exchange, is a one-stop web-based public library of progressive stateand local laws), and many more• http://projecthydra.org/ uses Blacklight as UI component19Thursday, May 2, 13
  • 20. © 2013 LucidWorkssearchworks at Stanford20Thursday, May 2, 13
  • 21. © 2013 LucidWorksAdvanced search at Stanfords searchworks21Thursday, May 2, 13
  • 22. © 2013 LucidWorkssearchworks:Mapping Text Boxes to Solr query pieces•http://code4lib.org/conference/2010/dushay_keck22Thursday, May 2, 13
  • 23. © 2013 LucidWorks•https://catalyst.library.jhu.edu/23Thursday, May 2, 13
  • 24. © 2013 LucidWorksRock and Roll!•m/24Thursday, May 2, 13
  • 25. © 2013 LucidWorksCommunity and Resources•code4lib:- http://www.code4lib.org/•HathiTrust folks- http://www.hathitrust.org/blogs/large-scale-search- http://robotlibrarian.billdueber.com/•http://bighumanities.net/- The Workshop on Big Humanities will be held in conjunction with the 2013IEEE International Conference on Big Data (IEEE BigData 2013), which willtake place between 6-9 October 2013 in Silicon Valley, California, USA, andwhich provides a leading international forum for disseminating the latestresearch in the growing field of “big data25Thursday, May 2, 13
  • 26. © 2013 LucidWorks26http://heatherbrewer.com/blog/2013/04/15/libraries-rock/Thursday, May 2, 13