VIVO: enabling the discovery of research and scholarship


Published on

An introduction to VIVO, an open source, semantic web application that enables discovery of research and scholarship across institutions and one library's role in its implementation and development.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Institutional VIVOs and other compatible proling applications are producing data to form a rich network of information that can be searched to foster collaboration across institutions and enable open sharing of research discovery.
  • A prototype of a national research resource discovery network–one that will help biomedical scientists search for and find previously invisible, but highly valuable, resources.
  • Also: Easily move institutional data from one system to another Recruit graduate students Assemble specialized review panels Plan budgets, services, resources
  • The technology can easily accommodate changes based on the demands that users make as they utilize VIVO. Other data sources we ’re working toward include grants, patent data, course responsibilities, and more.
  • Data is reused and repurposed in a wide array of tools and settings. Cornell University has done a stellar job of this – using VIVO data to provide current information about faculty and their interests for department and college websites; University of Florida reuses data from their VIVO for their CTSI member database – a move that other institutions will make, as well. VIVO data is available for reuse by web pages, applications, and other consumers both within and outside the institution. Individuals may access the browse and search functionalities of VIVO anytime via the web. Researchers, scholars, students, administrators, funding agencies, donors, and members of the general public may all benefit from using VIVO.
  • Here is a map of all of VIVO’s “adoptions, partners, and implementation work.” 50+ number of U.S.-based at some stage of implementations 19+ countries with implementation sites or are moving forward with work around VIVO including Brazil, Costa Rica, Puerto Rico, Mexico, U.S., Canada, England, Netherlands, India, Bangladesh, Saudi Arabia, Jordan, Australia, China Many of these sites are starting to think about VIVO as a back end for data. 
  • We have a lot of exciting collaborations that are ongoing with a number of groups…. For those of you who will only give up enterprise software from your cold dead hands, you can pay for third-party support (Wellspring).
  • Hell no. 1. VIVO is open source. You may download the application, manipulate the source code, do whatever you want to it. 2. The data in VIVO is open and in a standard accessible format. It can be freely consumed by any third party as machine-readable metadata. 3. VIVO is authoritative data. In this age of information glut, the essential question is no longer, “Does the data exist?” but “What is its provenance? Where did it come from? Is it reliable?” VIVO is designed to easily and recurrently draw from authoritative resources like Faculty Affairs or a grants database or, say, PubMed.
  • Many of us have experience updating web pages, so we know that when you add a bit of content to a plain HTML page, you don ’t specify what it is. But making a semantic assertion is different. With semantic data, you are adding content, specifying what type of content it is (for example- Is it a grant? A presentation? A department? An NGO?), and you’re specifying any relationships that data has with other data.
  • Each element – subject, predicate, object is governed by ontologies with semantics. VIVO 1.2 includes an ontology module representing research resources such as biological specimens, human studies, instruments, organisms, protocols, reagents, and research opportunities. This module is aligned with the top-level ontology classes and properties ! from the NIH-funded eagle-i Project ( ). We ’re also developing the extensions to the ontology that would allow more diverse types of efforts to be included in the profiles (blogs are currently avalable, wiki edits are not) – very important for microattribution/nanopublications efforts
  • Some of the developers at Indiana have developed a SPARQL query builder. ~~~
  • Anyone can create a third-party VIVO to search across multiple VIVOs.
  • For example, here is an site that leverages the power of semantic data to search across VIVO sites. This exemplar contains RDF from the seven implementation sites and Harvard Profiles. In addition, University of Iowa, which uses a system called Loki has announced that they can produce VIVO RDF, and Elsevier has said they will in the future. While searches for people are an obvious requirement for researcher networking, we don't want to limit ourselves to searching for people. VIVO's ontology-based data model is not limited to profiles of people, but includes organizations, events, publications, grants, and many other types of data. This enables VIVO to represent the relationships among people and other types of data as an interconnected network that can be accessed in many ways. We’ll demo this exemplar later.
  • Another visualization (not shown) is available for co-investigator relationships.
  • ScienceMap. Examine collections of publications for individuals, work groups, institutions
  • The combination of high quality and semantic data means you can do some exciting things with that data. Here ’s a screenshot from an application Alex Rockwell created that allows users to map out relations in an organizational hierarchy.
  • Let ’s take a look at how this is done.
  • New applications that take advantage of VIVO ’s linked open data are being developed from within the project as well as from a broad open source community
  • Miles Worthington Image from Dr. Barend Mons, Scientific Director of the Netherlands Bioinformatics Institute Allows experts to be found, but also ties the object to specific concepts
  • This figure displays “collaborative publications” found at 2 or more Researcher Networking websites.  The idea that institutions don't work together and that biomedical research is conducted in silos is not true. Researchers, even when separated by great distances, are in fact willing to work together, and this visualization demonstrates that they often do. Contact: Nick Benik (
  • We talked about Harvester. Here ’s a great tool for dealing with dirty data, one we developed at Weill Cornell. Google Refine + VIVO is an extension of Google Refine. It allows users to align a dataset to the VIVO core ontologies, assert relationships to help transform data from a grid to a semantic web graph, and export the data into a standardized triple format (such as N3) so that it can be easily ingested by a VIVO installation. We have used to ingest data from our course management system.
  • Most people here are a fraction of an FTE.
  • VIVO chooses to meet these participation challenges with the help of libraries – neutral campus entities that understand their user communities and their research environment. Libraries are increasingly involved in IT decisions Libraries have subject experts. Many librarians who work to support VIVO also provide reference and instructional support to the agricultural sciences, basic sciences, social sciences, and humanities. Librarians have good knowledge of their user communities – they understand what drives their institution ’s research environment. Of course, librarians have a strong service ethic and have a long history of providing academic support Facilitate the resolution of data integration issues endemic to legacy data/faculty reporting systems.
  • Harvester 3 rd party app for getting data into vio only took us far because of author disambiguation issues in conjunction with pubmed– harvester with scopus wos author disambiguation built in and if not researchers can wos researcher id scopus author id slated to be completed in the next 6 months
  • Here is a model of where we get data. Call us lazy, but we are disinclined to manually key in anything. As we all know, that type of practice is just not sustainable particularly when some of our prolific authors have 500 papers to their name…. In some cases (MSK and Hunter), we we have yet to get data. Another case where we still need to get data is publications. Pilot project with Thomson to get publications in from Web of Science. Getting publications into VIVO remains the most important thing we can do.
  • Also….
  • Screenshot from a prototype.
  • Screenshot from a prototype. Ver 1.4
  • As you can see, The VIVO project itself is a rather large, geographically dispersed team. 7 institutions Project areas: development, implementation, ontology, and outreach Special thanks to Kristi Holmes from Washington University and Jon Corson-Rikert from Cornell University for help with the slides for this presentation
  • VIVO: enabling the discovery of research and scholarship

    1. 1. VIVO: Enabling the Discovery of Research and Scholarship <ul><li>COLLEEN CUDDY, MA, MLS, AHIP </li></ul><ul><li>PAUL ALBERT, MA, MLS </li></ul><ul><li>Weill Cornell Medical Library </li></ul>
    2. 2. BACKGROUND INFORMATION <ul><li>What is VIVO? </li></ul>
    3. 3. VIVO Defined <ul><li>VIVO promotes open standards and open linked data about research – people, papers/products, funding, events/presentations, resources, projects/studies, data, concepts – and the relationships between them. </li></ul><ul><li>VIVO is a set of open source, community-maintained software tools for research discovery and networking in science and other disciplines. </li></ul><ul><li>VIVO is a world community of collaborators – scientists, implementers, developers. </li></ul>
    4. 4. One Minute VIVO History <ul><li>2003 – VIVO created for local use at Cornell University (Ithaca) to support a university-wide life sciences initiative </li></ul><ul><li>2009 – The National Center for Research Resources (NIH) awards the VIVO Collaboration a two-year, $12.2 million grant to VIVO for networking of researchers. A parallel grant for collecting and networking research resources was awarded to the eagle-i Consortium. </li></ul><ul><li>2010 Apr – Version 1.0 released </li></ul><ul><li>2010 July – Version 1.1 released </li></ul><ul><li>2010 Aug – First VIVO conference (NYC) </li></ul><ul><li>2011 Feb – Version 1.2 and Harvester version 1.0 </li></ul><ul><li>2011 July – Version 1.3 released </li></ul><ul><li>2011 Aug – Second VIVO conference (D.C.) </li></ul><ul><li>2012 Aug – Third VIVO conference (Miami) </li></ul>
    5. 5. Showcase credentials, expertise, skills, and professional achievements Connect within focus areas and geographic expertise Simplify reporting tasks and link data to external applications – e.g., to generate biosketches or CVs Publish the URL or populate profiles on other websites Find potential colleagues by research area, authorship, and collaborations Display visualizations of complex research networks and relationships What One Can (and Potentially Can) Do With VIVO
    6. 6. Who can use VIVO?
    7. 7. Reuse Data Across Applications
    8. 8. Original VIVO Implementation Sites <ul><li>University of Florida – Principal Investigator </li></ul><ul><li>Cornell University </li></ul><ul><li>Indiana University </li></ul><ul><li>Ponce School of Medicine </li></ul><ul><li>The Scripps Research Institute </li></ul><ul><li>Washington University School of Medicine </li></ul><ul><li>Weill Cornell Medical College (WCMC) </li></ul>
    9. 9. Current Pilot Implementation Sites and Collaborators <ul><li>University of Florida – Principe Investigator </li></ul><ul><li>Cornell University </li></ul><ul><li>Indiana University </li></ul><ul><li>Ponce School of Medicine </li></ul><ul><li>The Scripps Research Institute </li></ul><ul><li>Washington University School of Medicine </li></ul><ul><li>Weill Cornell Medical College (WCMC) </li></ul>
    10. 10. VIVO Collaborators Federal agencies – NIH, USDA, EPA, NSF, OSTP, FDP, STAR Metrics, NSF, NLM, VA, CENDI … Publishers and Aggregators – ORCID, Thomson Reuters, CiteSeer, arXiv, PLoS, BioMed Central, D-Space, Elsevier/Collexis … Press – Nature, Science, Cell, Genome Tech, SEED, The Chronicle, AP, Information Today, IEEE, Wired, … Professional Societies – APA, AAAS, AIRI, AAMC, ABRF, APA, … International – Australia, China, Netherlands, UK, Brazil, India, Costa Rica, Mexico, … Semantic Web community – eagle-i, DERI, Tim Berners-Lee, MyExperiment, ConceptWeb, Open Pharma Space (EU), Linked Data , … Social Network Analysis Community – Northwestern, Davis, UCF, … Schools and Consortia – CTSAs, CIC, SURA, FLR, Iowa, Harvard, UCSF, Pittsburgh, Stanford, MIT, Brown, Michigan, Colorado, OHSU, Duke, Minnesota, many more Sub-awards to Duke, Pittsburgh, Leicester, Stony Brook, Weill, Indiana Other Collaborators – Symplectic Elements, Wellspring, … Application and service providers – over 100 SourceForge File downloads – 10,000+ on SF since May 2010 Contact list – over 1,500
    11. 11. Is VIVO Facebook for Researchers? <ul><li>No Mendeley is… (Ha) </li></ul>
    12. 12. KEY CONCEPTS OF VIVO <ul><li>What is VIVO? </li></ul>
    13. 13. Semantic representation of data
    14. 14. <ul><li>Data are structured in the form of “triples” as subject-predicate-object. </li></ul><ul><li>Information is stored using the Resource Description Framework (RDF). </li></ul><ul><li>Concepts and their relationships use a shared ontology to facilitate the harvesting of data from multiple sources. </li></ul>How does VIVO store data? Jane Smith is member of author of has affiliation with Dept. of Genetics College of Medicine Journal article Book chapter Book Genetics Institute Subject Predicate Object
    15. 15. <ul><li>Dublin core – description of resources </li></ul><ul><li>Event – arbitrary time/space regions </li></ul><ul><li>FOAF – people </li></ul><ul><li>Geopolitical – countries, regions </li></ul><ul><li>SKOS – thesauri, classification schemes, knowledge </li></ul><ul><li>BIBO – publications, citations </li></ul><ul><li>VIVO – additional classes and properties to meet VIVO application needs </li></ul>Selected Core Ontologies
    16. 16. <ul><li>Research (bibo:Document, vivo:Grant, vivo:Project, vivo:Software, vivo:Dataset, vivo:ResearchLaboratory) </li></ul><ul><li>Teaching (vivo:TeacherRole, vivo:AdvisingRelatioship) </li></ul><ul><li>Services (vivo:Service, vivo:CoreLaboratory, vivo:MemberRole) </li></ul><ul><li>Expertise (vivo:SubjectArea) </li></ul>Example: modeling a person
    17. 17. Alignment with eagle-I ontology
    18. 18. Local Dataflow into VIVO > > > > RDF harvest SPARQL endpoint interactive input VIVO (RDF) data ingest ontologies (RDF) shared as RDF local systems of record external sources
    19. 19. Linked Open Data RDF Triples RDF Triples
    20. 20. Community-Wide Semantic Search Search across VIVO sites
    21. 21. DATA VISUALIZATION <ul><li>What is VIVO? </li></ul>
    22. 22. Visualize co-author relationships
    23. 23. ScienceMap visualizes collections of publications
    24. 24. TOOLS & APPS FOR VIVO <ul><li>What is VIVO? </li></ul>
    25. 25. Draw organizational charts
    26. 26. VIVO produces both HTML and RDF Software reads VIVO RDF and displays
    27. 27. @mileswortho
    28. 29. Inter-institutional Collaboration Explorer
    29. 30. Power Tool for Dirty Data – Google Refine + VIVO
    30. 31. LIBRARIES AS KEY PARTNERS <ul><li>What is VIVO? </li></ul>
    31. 32. VIVO Team at WCMC <ul><li>Curtis Cole, MD – Principal Investigator </li></ul><ul><li>Paul Albert – Data Architect </li></ul><ul><li>Mark Bronnimann – Systems Administrator </li></ul><ul><li>Eliza Chan – Developer </li></ul><ul><li>Dan Dickinson – Implementation/Technical Lead </li></ul><ul><li>Kenneth Lee – Project Manager </li></ul><ul><li>Grace Migliorosi, PhD – Outreach Coordinator </li></ul><ul><li>John Ruffing – Partner Site Integration Specialist </li></ul>
    32. 33. Why Libraries? <ul><li>Are a trusted, neutral space </li></ul><ul><li>Have a tradition of service and support </li></ul><ul><li>Strive to serve all missions of the institution </li></ul><ul><li>Are technology centers and have IT and data expertise </li></ul><ul><li>Have skills—information organization, instruction, usability, subject expertise, ontologies and controlled vocabularies </li></ul><ul><li>Have close relationships with their clients (buy in) </li></ul><ul><li>Understand user needs </li></ul><ul><li>Understand the importance of collaboration and know how to bring people together </li></ul><ul><li>Have knowledge of institution, research, education, clinical landscape </li></ul>Library Staff: Libraries:
    33. 34. The Library’ s Role in VIVO <ul><li>Identify target sources of data and set priorities </li></ul><ul><li>Resolve ambiguities between conflicting sources of data </li></ul><ul><li>Propose mappings to VIVO’ s classes and properties </li></ul><ul><li>Propose standards of data formatting </li></ul><ul><li>Create local extensions to the ontology (e.g. institutional identifier) </li></ul><ul><li>Create and implement policies for end users </li></ul>
    34. 35. COMING SOON <ul><li>What is VIVO? </li></ul>
    35. 36. What’ s next at Weill Cornell <ul><li>Publications, publications, publications!!! </li></ul><ul><li>Implement self-edit </li></ul><ul><li>Internal outreach and training </li></ul><ul><li>Internal operations support </li></ul><ul><li>Upgrade to VIVO 1.3 </li></ul><ul><li>Data exchange with CTSA partners – Hunter, Methodist, HSS, MSKCC </li></ul><ul><li>Establishing policies </li></ul>
    36. 37. WCMC/CTSA Sources of Data <ul><li>Local Systems of Record </li></ul><ul><ul><li>HR </li></ul></ul><ul><ul><li>RASP </li></ul></ul><ul><li>Data Aggregators and Repositories </li></ul><ul><ul><li>PubMed </li></ul></ul><ul><ul><li>Web of Science </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Individuals or their Proxies </li></ul>
    37. 38. What’ s next on the International Level <ul><li>Identify best practices for incorporating external data sources for publications and grants </li></ul><ul><li>Support easier customization and improved semantic search </li></ul><ul><li>Extend the ontology to support intellectual property, research datasets, and more detail about research resources </li></ul><ul><li>Identify and commit to a governance structure </li></ul><ul><li>Better provenance </li></ul><ul><li>and… </li></ul>
    38. 39. Full integration with Digital Vita CV
    39. 40. What ’s next (National Level) <ul><li>Display visualizations of complex research networks and relationships. </li></ul><ul><li>Incorporate external data sources for publications and grants. </li></ul><ul><li>Support easier customization and improved semantic search. </li></ul><ul><li>Link data to external applications, e.g. generate CV ’s or bio sketches. </li></ul><ul><li>Ability to search all VIVO sites/communities. </li></ul>Improved search relevance and faceting
    40. 41. DEMONSTRATION <ul><li>What is VIVO? </li></ul>
    41. 42. Learn More About VIVO <ul><li>Project – </li></ul><ul><li>Multi-site search (beta) – </li></ul><ul><li>Sourceforge – </li></ul><ul><li>Facebook – </li></ul><ul><li>Twitter – </li></ul>
    42. 43. Cornell University Dean Krafft (Cornell PI) Manolo Bevia Jim Blake Nick Cappadona Brian Caruso Jon Corson-Rikert Elly Cramer Medha Devare Elizabeth Hines Huda Khan Deepak Konidena Brian Lowe Joseph McEnerney Holly Mistlebauer Stella Mitchell Anup Sawant Christopher Westling Tim Worrall Rebecca Younes University of Florida Mike Conlon (VIVO and UF PI) Beth Auten Michael Barbieri Chris Barnes Kaitlin Blackburn Cecilia Botero Kerry Britt Erin Brooks Amy Buhler Ellie Bushhousen Linda Butson Chris Case Christine Cogar Valrie Davis Mary Edwards Nita Ferree Rolando Garcia-Milan George Hack Chris Haines Sara Henning Rae Jesano Margeaux Johnson Meghan Latorre Yang Li Jennifer Lyon Paula Markes Hannah Norton James Pence Narayan Raum Nicholas Rejack Alexander Rockwell Sara Russell Gonzalez Nancy Schaefer Dale Scheppler Nicholas Skaggs Matthew Tedder Michele R. Tennant Alicia Turner Stephen Williams Indiana University Katy Borner (IU PI) Kavitha Chandrasekar Bin Chen Shanshan Chen Ryan Cobine Jeni Coffey Suresh Deivasigamani Ying Ding Russell Duhon Jon Dunn Poornima Gopinath Julie Hardesty Brian Keese Namrata Lele Micah Linnemeier Nianli Ma Robert H. McDonald Asik Pradhan Gongaju Mark Price Michael Stamper Yuyin Sun Chintan Tank Alan Walsh Brian Wheeler Feng Wu Angela Zoss Ponce School of Medicine Richard J. Noel, Jr. (Ponce PI) Ricardo Espada Colon Damaris Torres Cruz Michael Vega Negrón This project is funded by the National Institutes of Health, U24 RR029822 &quot;VIVO: Enabling National Networking of Scientists ” The Scripps Research Institute Gerald Joyce (Scripps PI) Catherine Dunn Sam Katkov Brant Kelley Paula King Angela Murrell Barbara Noble Cary Thomas Michaeleen Trimarchi Washington University School of Medicine in St. Louis Rakesh Nagarajan (WUSTL PI) Kristi L. Holmes Caerie Houchins George Joseph Sunita B. Koul Leslie D. McIntosh Weill Cornell Medical College Curtis Cole (Weill PI) Paul Albert Victor Brodsky Mark Bronnimann Adam Cheriff Oscar Cruz Dan Dickinson Richard Hu Chris Huang Itay Klaz Kenneth Lee Peter Michelini Grace Migliorisi John Ruffing Jason Specland Tru Tran Vinay Varughese Virgil Wong VIVO Collaboration
    43. 44. QUESTIONS? <ul><li>Thank You! </li></ul>