• Save
Search Across Multiple VIVO Instances
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Search Across Multiple VIVO Instances

on

  • 689 views

A demonstration of vivosearch.org and the open source tools that were developed to build the site....

A demonstration of vivosearch.org and the open source tools that were developed to build the site.

Presented by Brian Caruso, Miles Worthington and Nick Cappadona on Thursday, August 25 at the 2011 VIVO Conference in National Harbor, MD, USA.

Statistics

Views

Total Views
689
Views on SlideShare
656
Embed Views
33

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 33

http://beta.vivosearch.org 33

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Alternative: Search Across the Seven Partner VIVO Instances\n
  • Brief intro and background on how we got to this point\n
  • Ontologies\n* Bibontology for publications\n* FOAF for people and organizations\n* eagle-i for scientific and research resources\n\n* Defines the common thread across institutions (tap into this for search faceting/filtering)\n
  • * Uniform Resource Identifier - a string used to identify a resource on the web\n* Resource Description Framework - a generic graph-based data model for describing things, including relationships to other things\n* HyperText Transfer Protocol - simple, universal mechanism for requesting and retrieving resources or descriptions of resources\n
  • * more than simply installing the VIVO app\n* efforts of Implementation and Outreach teams are too often overlooked\n* the buy-in and support from administration and faculty are critical\n\n* VIVO implemented at institutions and organizations beyond the 7 on the grant\n - University of Colorado\n - StonyBrook\n - there are definitely others...ask Elly for latest numbers?\n
  • * local SORs are key - you could load all of the data manually but that’s no fun\n* Harvester is available as subproject on sourceforge - library of ETL tools\n - extract, transform, load\n - initial integration of Harvester in VIVO 1.3\n* Even with automated ingest, you’ll still want to edit/add information on an individual basis\n\n
  • * so we’ve laid down this foundation and we now have the VIVO app running at the 7 partner institutions, but how do we tie all of this data together and start using it to help us discover new collaborations\n* make connections...the start of a network (probably too loaded of a term)\n\n* was thinking of listing out the URLs of the seven VIVO partner instances prior to this slide while I spoke about the points above, but felt it wasn’t necessary\n\n* vivosearch.org\n  - an example site that searches the VIVO instances at the 7 partner institutions on the grant\n  - also includes Harvard Catalyst Profiles as evidence of interoperability with external apps \n* go right into the search - start with a suggested term\n* provide a scenario or 2 that makes use of faceting\n   - need to come of with these\n* follow a result to the source institution\n
  • * reiterate that this is an example site :)\n* HCP has aligned itself with the VIVO core ontology, serves profiles data as RDF\n* these 2 tools are works in progress and are free for you to download and use in building a similar search site\n* we’d like to show you a closer look at each of these and provide some details on how you can build a search site of your own\n\n* need to add the link for the sandbox project once it’s online at Drupal.org\n
  • Pass off to Brian Caruso to work his magic\n
  • * emphasize that the end result is a Solr index\n* we use Solr because it’s proven and it’s fast\n* revisit Linked Data\n - making HTTP requests to VIVO instances and retrieving RDF using URIs\n
  • * alternate title: LDIB Minimum Requirements\n* alternate title: LDIB Ingredients\n\n* HCP is an example of one such non-VIVO site (although it doesn’t serve Linked Data -- one URI for both HTML or RDF representations)\n* this list of URIs define what will be retrieved and indexed in subsequent requests\n
  • * All steps during index building\n share very little state\n should be very parallelizable\n
  • Provide service to link individuals from one VIVO to individuals in another VIVO instance\n
  • * Solr is highly scalable\n distributed indexes\n used by netflix, monster.com, digg\n
  • Should we introduce the Solr schema here or anywhere else or is it just not worth getting into that level of detail in this presentation?\n\n* the fact that we are manually curating the site information should reinforce that we currently have no registration or signup system beyond “email Brian Caruso...”\n\n* should we mention scaling here? What do we want to say besides “scaling with Solr”\n - are there any particular example projects/numbers we want to point to?\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • - Search is a goal-oriented activity. Users are typically not searching for fun. Get out of their way.\n- Google and others have established UI patterns that users are comfortable with. The UI itself is not where we want to experiment.\n- It seems so simple and familiar, but many subtleties in a search interface.\n- Usability testing is not negotiable.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • * we don’t want to give you the wrong impression, that using only these 2 open source tools you can build this exact site, pixel for pixel.\n* additional utilities for facets, class group taxonomy, institution management\n\n\n
  • * show barebones D7 site with default theme and VIVO search module to illustrate extra work\n* place screenshot here and then also quickly demo this live\n\n* we will also need Apache Solr Search Integration module as well (anything else)?\n* I will work on this tonight/tomorrow\n \n
  • * focusing on the fact that it’s more than just these 2 tools is not the point\n* instead bring the focus to Solr\n* explain why we chose it\n* illustrate the flexibility it provides\n* Drupal is not a requirement, just one example\n* demo AJAX Solr site connected to Rollins index\n\nI will work on the AJAX Solr site tonight/tomorrow as well.\n
  • \n

Search Across Multiple VIVO Instances Presentation Transcript

  • 1. Search Across Multiple VIVO Instances Brian Caruso, Miles Worthington, Nick Cappadona Albert R. Mann Library Cornell University1
  • 2. Building the foundation • VIVO core ontology • Linked Data • Implementation & Adoption • Ingest & Editing2
  • 3. VIVO core ontology • A hierarchy of classes and properties • Incorporates segments of established ontologies – Bibontology – FOAF – eagle-i • Provides structure for modeled data3 h$p://vivoweb.org/ontology/core
  • 4. Linked Data“ structured data on the Web A set of best practices for publishing and connecting ” • URIs • RDF • HTTP4 h$p://linkeddata.org
  • 5. Implementation & Adoption • VIVO implemented at 7 partner institutions Cornell University University of Florida Indiana University Washington University in St. Louis School of Medicine Ponce School of Medicine Weill Cornell Medical College The Scripps Research Institute • Buy-in and support5
  • 6. Ingest & Editing • Identify local systems of record – HR – Grants – Faculty Activity • Load data – Harvester – Ingest Tools • Curation and self-editing6
  • 7. vivosearch.org A Demonstration7
  • 8. vivosearch.org • An example of multi-institutional search • Includes 7 partner institutions – plus Harvard Catalyst Pro les • Built using 2 tools developed on the grant – Linked Data Index Builder – VIVO Search Drupal module • Both are open source and available today – http://vivosearch.org/tools8
  • 9. Preparing Linked Data for search Linked Data Index Builder9 h$p://vivosearch.org/tools
  • 10. Linked Data Index Builder • A tool to create a Solr index from VIVO sites • Linked Data principles – URIs – RDF – HTTP • Solr – open source enterprise search platform – http://lucene.apache.org/solr10
  • 11. LDIB input • URL of VIVO instance – or any site serving LD aligned with VIVO core ontology – http://vivo.cornell.edu • Method/service to retrieve list of URIs – provided in VIVO through Index page – http://vivo.cornell.edu/browse11
  • 12. LDIB process12
  • 13. Map of our Solr system13
  • 14. Map of our Solr system14
  • 15. LDIB to do • Improve fault tolerance • Automate update/sync • Experiment with scaling • Management tools – need governance model to design tools – site_name and site_url are manually curated – no registration system in place15
  • 16. Searching
a
LDIB
index
with
Drupal VIVO search Drupal module16 h$p://vivosearch.org/tools
  • 17. Why Drupal? • Need a website as well • Can tap into core search features • Existing framework for connecting to Solr17
  • 18. Apache Solr search integration module • Flexible, not limited to Drupal content • Active community • Commercially backed18
  • 19. VIVO search module • Built for Drupal 7 • Works on top of the existing Drupal module • Uses Drupals core search system • Packaged with 3 search facets: classgroup, type, institution • Written speci cally for LDIB indexes19
  • 20. Developing a search site VIVOsearch.org interface20
  • 21. 21
  • 22. User priorities • All users want relevant search results • Most users demand quick search results • Some users want to manipulate search results22 h$p://searchuserinterfaces.com
  • 23. Development priorities 1. Relevance 2. Performance 3. Controls23 h$p://searchuserinterfaces.com
  • 24. Relevance • Good result ranking • Scannable results • Clear context • Result totals • Handle empty results24
  • 25. Performance • More critical than usual • Dont interrupt users train of thought • Users will quickly abandon your site25
  • 26. Performance “ Web search engines typically show ten results, or “hits,” per page, with hyperlinks to additional pages of results .... a Google VP reported that despite the fact that users said they wanted more hits per page, an experiment in which the number of hits was increased to 30 hits per page showed a 20% reduction in traffic (Linden, 2006). The reason turned out to be that while the page with 10 results took 0.4 seconds to generate, the page with 30 ” results took 0.9 seconds on average. h$p://searchuserinterfaces.com/book/sui_ch5_retrieval_results.html26
  • 27. Performance enhancements • Solr • Apache mod_pagespeed • Lots of caching • Data URIs for CSS images • CSS/JS aggregation and compression27
  • 28. Controls • Strive for predictability and consistency • Facets must be intuitive • Offer an escape route28
  • 29. Usability testing • 5 sessions • Covered tasks for entire site • Results overall positive • Revealed issues with controls29
  • 30. 30
  • 31. 31
  • 32. Future enhancements • Improved result ranking • More informative text snippets • Spelling and term suggestions • Con guration for VIVO search module32
  • 33. Build a search site using the tools we developed Roll Your Own33
  • 34. More than meets the eye • vivosearch.org > LDIB + Drupal module – theme – additional utilities34
  • 35. 35
  • 36. Look Mom, no Drupal • Solr is the key • choose your weapon for integration – http://wiki.apache.org/solr/IntegratingSolr • Drupal is not a requirement36
  • 37. Brian Caruso brian.caruso@cornell.edu Miles Worthington miles.worthington@cornell.edu Nick Cappadona nick.cappadona@cornell.edu vivo-dev-all@lists.sourceforge.net Questions? Thank You37