I work on an open-source, semantic web based project called VIVO. Our work on VIVO is concerned with representing information about people – of course this includes the people we likely have in our minds right now who are typically participating in some way in the scholarly environment But increasingly we’ve also had a lot of interest from people who are more non-traditional – such as the citizen scientists. Important to represent a person’s interests, efforts and and areas of expertise Who are they? What do they do? What do they study and contribute to our understanding of our world around us?
At each implementation, VIVO enables research discovery – providing verifiable information about research and researchers. Each institution provides its own VIVO system and data. Local governance determines data to be provided. Across institutions VIVO provides a uniform semantic structure to enable a new class of tools using the data to advance science.
What is VIVO? It’s a semantic web application with rich profiles that display publications, teaching, service and professional affiliations. Faceted search for fast and meaningful results. What do you mean by “Semantic Web”? A group of methods and technologies to allow machines to understand the meaning – or "semantics" – of information on the World Wide Web. --------- Goal of VIVO : Improve all of science by providing the means for sharing and using current, accurate, and precise information regarding scientists’ interest, activities, and accomplishments. Foster team science by providing tools for identifying potential collaborators . Improve collaboration by creating tools that consume this data and repurpose it in such a way to enhance new and existing teams. Not limited to science – at Cornell, VIVO covers all disciplines across the entire institution
Profiles are largely created via automated data feeds , but can be customized to suit the needs of the individual. Information is open source (free) and is stored in a framework that allows for exporting to other applications. Profiles are richer in content than typical [web pages or] social networking sites and will rank higher in general internet searches.
VIVO harvests much of its data automatically from verified sources Therefore, reducing the need for manual input of data & centralizing information and providing an integrated source. Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input. The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution. Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. Search results are faceted so information can be located rapidly and with less time spent sorting through information. So where do we get our information for VIVO? So far agencies, repositories, and aggregators have been identified for VIVO.
Each element – subject, predicate, object is governed by ontologies with semantics. VIVO 1.2 includes an ontology module representing research resources such as biological specimens, human studies, instruments, organisms, protocols, reagents, and research opportunities. This module is aligned with the top-level ontology classes and properties ! from the NIH-funded eagle-i Project (https://www.eagle-i.org/home/). We’re also developing the extensions to the ontology that would allow more diverse types of efforts to be included in the profiles (blogs are currently avalable, wiki edits are not) – very important for microattribution/nanopublications efforts
VIVO uses linked open data concepts to provide data as RDF at URIs for each scientist. Critically important for building a web of data. Predicates have addresses, sites point to objects in other triples stores. Resolve queries across triple stores – “show investigators who genetic work is implicated in breast cancer.” VIVO won’t have information linkages between breast cancer and disease. Other resources will. But VIVO can link to external sources. “Mike worksOn GeneY” So where does data about Interests, activities and accomplishments come from? Archives. Data Aggregators. Publishers. Institutional repositories. So now we turn to tools
Strong open source development component to the project – this is reflected in part by the top notch applications that were submitted to a recent call for applications by the project
Miles Worthington Image from Dr. Barend Mons, Scientific Director of the Netherlands Bioinformatics Institute Allows experts to be found, but also ties the object to specific concepts
Nick benik at Harvard
There are many beautiful visualizations, developed by Katy Borner’s group at Indiana University. These include co-author and co-investigator networks and even temporal visualizations which allows discovery of grants and publications by defined groups over time within and beyond an institution. Most recently, the visualization team implemented a Science Map visualization, which allows users to visually explore the scientific strengths of a university, school, department, or person in the VIVO instance. Users will be able to see where an organization or person’s interests lay across 13 major scientific disciplines or 554 sub-disciplines, and will be able to see how these disciplines and sub-disciplines interrelate with one another on the map of science.
DEEP SEMANTIC SEARCH While searches for people are an obvious requirement for researcher networking, we don't want to limit ourselves to searching for people. VIVO's ontology-based data model is not limited to profiles of people, but includes organizations, events, publications, grants, and many other types of data. This enables VIVO to represent the relationships among people and other types of data as an interconnected network that can be accessed in many ways.
Core project development is augmented with contributions and feedback by other developers across multiple institutions on SourceForge. The open source community around VIVO is robust and dedicated. SourceForge also offers an open environment to share materials and ideas related to implementation and adoption. More and more content is added every day
As you can see, The VIVO project itself is a rather large, geographically dispersed team. 7 institutions Project areas: development, implementation, ontology, and outreach Inspiring, hard-working group of people with whom I am grateful to know and collaborate with on the project.
Transcript of "Facilitating Open Science and Research Discovery via VIVO and the Semantic Web"
Facilitating Open Science and Research Discovery via VIVO and the Semantic Web Kristi Holmes, PhD Bioinformaticist Becker Medical Library http://vivo.wustl.edu/display/n4754 Twitter: @kristiholmes December 5, 2011Facilitating Open Science and Research Discovery via VIVO and the Semantic Web by Kristi L. Holmes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Public, structured linked data about investigators interests, activities andaccomplishments, and tools to use that data to advance science
What is VIVO? An open-source semantic web application that An open-source semantic web application that enables the discovery of research and enables the discovery of research and scholarship across disciplines in an institution. scholarship across disciplines in an institution. Populated with detailed profiles of faculty and Populated with detailed profiles of faculty and researchers; displaying items such as researchers; displaying items such as publications, teaching, service, and professional publications, teaching, service, and professional affiliations. affiliations. A powerful search A powerful search functionality for locating functionality for locating people and information within people and information within or across institutions. or across institutions.
A VIVO profile allows you to: Find potential colleagues by research area, authorship, Find potential colleagues by research area, authorship, and collaborations. and collaborations. Showcase credentials, expertise, skills, and professional Showcase credentials, expertise, skills, and professional achievements. achievements. Connect within focus areas and geographic expertise. Connect within focus areas and geographic expertise. Simplify reporting tasks and link data to external Simplify reporting tasks and link data to external applications – e.g., to generate biosketches or CVs. applications – e.g., to generate biosketches or CVs. Publish the URL or link the profile to other applications. Publish the URL or link the profile to other applications. Display visualizations of complex research networks and Display visualizations of complex research networks and relationships. relationships.
VIVO harvests data from verified sources Faculty and unit Faculty and unit administrators can then administrators can then add additional add additional information to their information to their profile. (M) profile. (M) External data sources (I): External data sources (I):Internal data sources (I): Internal data sources (I): • • Publication warehouses- Publication warehouses-• •HR Directory HR Directory e.g. PubMed, Web of e.g. PubMed, Web of• •Office of Sponsored Research Office of Sponsored Research Science, and more. Science, and more.• •Institutional Repositories Institutional Repositories • • Grant databases: Grant databases:• •Registrar System Registrar System e.g. NSF/ NIH e.g. NSF/ NIH• •Faculty Activity Systems Faculty Activity Systems • •National Organizations: National Organizations:• •Events and Seminars Events and Seminars AAAS, AMA, etc. AAAS, AMA, etc. Data stored as RDF triples Data stored as RDF triples using standard ontology using standard ontology VIVO data is available for reuse by web pages, applications, and other consumers both within and outside the institution.
How does VIVO store data? Information is stored using the Resource Description Framework (RDF) and data are structured in the form of “triples” as subject-predicate-object. Concepts and their relationships use a shared ontology to facilitate the harvesting of data from multiple sources. Dept. of Genetics College of Medicine is member of Jane Genetics Smith has affiliations with Institute Journal author of article Book Book chapter Subject Predicate Object
Using VIVO data By storing data in VIVO in RDF and using standard ontologies, the information in VIVO can either be displayed in a human readable web page or delivered directly to other systems as RDF. This allows the open researcher data in VIVO to be harvested, aggregated, and integrated into the Linked Open Data cloud. VIVO enables authoritative data about researchers to become part of the Linked Data cloud.
The Semantic Web & Researcher Networking• Increasing recognition of the value of semantic web standards• Increasing momentum in support of semantic web technologies to facilitate research discovery• Recommendations for researcher networking recently endorsed by the CTSA Consortium Steering Committee represent a new standard in researcher networking. – Read more at http://vivoweb.org/blog• Examples of applications that consume these rich data include: visualizations, enhanced multi-site search, and VIVO Searchlight. Other utilities are in development across a wide range of topic areas.
Notable SemWeb projects• Dbpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.• NextBio is a database consolidating high-throughput life sciences experimental data tagged and connected via biomedical ontologies.• GoPubMed a semantic search engine for the life sciences. It uses the GeneOntology (GO) and the Medical Subject Headings (MeSH) to semantically filter millions of biomedical abstracts from MEDLINE.• OpenPHACTS will create an open innovative platform, Open Pharmacological Space, which will be freely accessible for knowledge discovery and verification. Open PHACTS will provide a growing body of data on small molecules, their pharmacological profiles, pharmacokinetics, biological targets and pathways in a semantically interoperable format. Aligning and integrating proprietary and public data sources into a single system is currently a very difficult and time consuming task, repeated across companies, institutes and academic laboratories.• Open Government initiatives• Publications efforts• DOD• Federal Profiling• Many others
open data, open tools, open process Thank you!
University of Florida Indiana University VIVO CollaborationMike Conlon (VIVO and UF PI) Katy Borner (IU PI) Beth Auten Kavitha Chandrasekar Michael Barbieri Bin Chen Chris Barnes Shanshan Chen Kaitlin Blackburn Ryan Cobine Cecilia Botero Jeni Coffey Cornell University Kerry Britt Suresh Deivasigamani Dean Krafft (Cornell PI) Washington University School of Erin Brooks Ying Ding Manolo Bevia Medicine in St. Louis Amy Buhler Russell Duhon Jim Blake Rakesh Nagarajan (WUSTL PI) Ellie Bushhousen Jon Dunn Nick Cappadona Kristi L. Holmes Linda Butson Poornima Gopinath Brian Caruso Caerie Houchins Chris Case Julie Hardesty Jon Corson-Rikert George Joseph Christine Cogar Brian Keese Elly Cramer Sunita B. Koul Valrie Davis Namrata Lele Medha Devare Leslie D. McIntosh Mary Edwards Micah Linnemeier Elizabeth Hines Nita Ferree Nianli Ma Huda Khan Weill Cornell Medical College Rolando Garcia-Milan Robert H. McDonald Depak Konidena Curtis Cole (Weill PI) George Hack Asik Pradhan Gongaju Brian Lowe Paul Albert Chris Haines Mark Price Joseph McEnerney Victor Brodsky Sara Henning Michael Stamper Holly Mistlebauer Mark Bronnimann Rae Jesano Yuyin Sun Stella Mitchell Adam Cheriff Margeaux Johnson Chintan Tank Anup Sawant Oscar Cruz Meghan Latorre Alan Walsh Christopher Westling Dan Dickinson Yang Li Brian Wheeler Tim Worrall Richard Hu Jennifer Lyon Feng Wu Rebecca Younes Chris Huang Paula Markes Angela Zoss Itay Klaz Hannah Norton Kenneth Lee James Pence The Scripps Research Ponce School of Medicine Peter Michelini Narayan Raum Institute Richard J. Noel, Jr. (Ponce PI) Grace Migliorisi Nicholas Rejack John Ruffing Gerald Joyce (Scripps PI) Ricardo Espada Colon Alexander Rockwell Jason Specland Catherine Dunn Damaris Torres Cruz Sara Russell Gonzalez Tru Tran Sam Katkov Michael Vega Negrón Nancy Schaefer Vinay Varughese Brant Kelley Dale Scheppler Virgil Wong Paula King Nicholas Skaggs Angela Murrell Matthew Tedder Barbara Noble Michele R. Tennant Alicia Turner Cary Thomas This project is funded by the National Institutes of Health, U24 RR029822 Michaeleen Trimarchi "VIVO: Enabling National Networking of Scientists” Stephen Williams
AcknowledgementsFunding: Collaborations:• VIVO, NIH award U24 RR029822 • Washington University ICTS,• Washington University Institute Departments of Clinical and Translational • VIVO colleagues from across Sciences, NIH award UL1 the country RR024992 • Becker Library colleaguesQuestions: • Library colleagues everywhere• email@example.com• Twitter: @kristiholmes• http://vivo.wustl.edu/display/n4754 Thanks!