NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

NISO/DCMI Webinar:
Semantic Mashups Across
Large, Heterogeneous Institutions:
Experiences from the VIVO Service
May 22, 2013
Speaker:
John Fereira,
Senior Programmer/Analyst and
Technology Strategist at Cornell University
http://www.niso.org/news/events/2013/dcmi/vivo

Semantic mashups across
large, heterogeneous
institutions: experiences
from the VIVO service
John Fereira
Cornell University

Overview
• What is VIVO?
• History of VIVO
• High level Overview
• Ingesting Data into VIVO
• Exposing Data in Vivo

What is VIVO?
• VIVO is not an acronym
• A semantic web application that enables the discovery of
research and scholarship across disciplines in an
institution.
• VIVO enables collaboration and understanding across an
institution and among institutions – and not just for
scientists.
• A powerful search/browse functionality for locating people
and information within or across institutions.

What is VIVO?
• An ontology editor. Vivo includes a “vivo” ontology
with can be modified and extended
• An instance editor. Instances of classes such as a
Person, Organization, Event, etc. can be created,
modified, and deleted
• Content can also be brought into VIVO in automated
ways from local systems of record, such as HR,
grants, course, and faculty activity databases, or
from database providers such as publication
aggregators and funding agencies.

What is VIVO?
• VIVO is a content disseminator
• Views of People, Organizations, etc. can be highly
customized
• VIVO provides visualizations such as topic maps, co-
authorship networks
• Open data means other applications can use it

A brief History of VIVO
• 2003 – Vivo created for local use at Cornell University
for life sciences collaboration
• 2007 - Reimplemented using RDF, OWL, Jena and
SPARQL
• 2007 – Implemented at Cornell and University of
Florida as “production” systems

A brief History of VIVO
• 2009 - seven institutions received $12.2 million in
funding from the National Center for Research
Resources of the NIH to enable a national network of
scientists
• 2010 – Version 1.0 released as open source
• 2013 – Now at version 1.5.1
• 2013 – Transitioning from funded project to a
sustainable community open source project

A high level Overview
• Core ideas
• Searching/browsing
• Self editing

Core ideas
• Research and researchers should be discoverable
independently of administrative hierarchies
• Relationships are as interesting as the facts
• It’s the network, not just the nodes
• Static data models are too confining
• Granular data management allows multiple views and
re-purposing
• Discovery is improved by linking pages to surrounding
context

VIVO and Linked Open Data
• VIVO enables authoritative data about researchers to become
part of the Linked Open Data (LOD) cloud
Tim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl

Linked Data principles
Tim Berners-Lee:
▫ Use URIs as names for things
▫ Use HTTP URIs so that people can look up those names
▫ When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
▫ Include links to other URIs so that people can discover
more things
http://linkeddata.org

Searching and Browsing
• Triple store indexed into a SOLR instance
• Searches are against SOLR
• Instance data comes from triplestore
• An example…

Self Editing
• Users can edit their own profile
• System can delegate editing to “proxy” editors
• Some data can be locked
• An example

Editable and non-editable fields

Most text fields support “rich text”

External Concepts for “terms”

VIVO harvests much of its data automatically from
verified sources
•Reduces the need for manual input of data
•Provides an integrated and flexible source of publicly
visible data at an institutional level
Data, data, data
Individuals may also edit and customize their profiles to
suit their professional needs
External data
sources
Internal data
sources

Ingesting data with the Vivo Harvester
• A pipeline of tools
• Tools are written java, using Jena APIs
• Can fetch data from a variety of data formats
• Data can be sanitized and disambiguated
• Data is ingested directly to the triple store…does not
require VIVO web app to be running

Harvesting Pipeline
• Fetcher/Parser
• Translate: maps rdf to “vivo” RDF
• Transfer to local triple store (Jena TDB)
• Disambiguate using Scoring/Matching
• Changenamespace (mint unique URIs)
• Diff with previous model to create subtractions
• Transfer to VIVO triple store

Fetching and Parsing
• Fetches data from a URL, Database, local file
• Many different types of fetchers
▫ CSV fetcher
▫ JDBC fetcher
▫ SimpleXMLFetcher
▫ JSONFetcher
• Output is intermediate RDF Format, one file per
record
• “Fake” namespace used

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/"
xml:base="http://vivo.example.com/harvest/aims_users/person">
<rdf:Description rdf:ID="node_-_0">
<rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/>
<node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture>
<node-person:Website>http://www.valeriapesce.name</node-person:Website>
<node-person:Nid>108074</node-person:Nid>
<node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively on
metad
ata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCS
group in
FAO.</node-person:Profile>
<node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization>
<node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise>
<node-person:LastName>Pesce</node-person:LastName>
<node-person:Country>Italy</node-person:Country>
<node-person:Email>valeria.pesce@fao.org</node-person:Email>
<node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation>
<node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL>
<node-person:Username>valeria.pesce</node-person:Username>
<node-person:FirstName>Valeria</node-person:FirstName>
<node-person:Role>Information Management Specialist</node-person:Role>
<node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD Content
Management
Task Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - International
Association of
Agricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data
- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests>
</rdf:Description>
</rdf:RDF>

Translate
• Map “fake” namespace to VIVO classes and
properties
• Uses XSLT transform
• Unique ID for each record
• node-person:Organization becomes
foaf:Organization
• Relationships created

Translated RDF
<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdfs:label>Pesce, Valeria</rdfs:label>
<core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/>
<foaf:firstName>Valeria</foaf:firstName>
<foaf:lastName>Pesce</foaf:lastName>
<core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail>
<core:positionInOrganization
rdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20
United%20Nations%20(FAO)"/>
</rdf:Description>
<rdf:Description
rdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20Uni
ted%20Nations%20(FAO)">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
<rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label>
<core:organizationForPosition
rdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organ
ization%20of%20the%20United%20Nations%20(FAO)"/>
<core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/>
</rdf:Description>

Transfer
• Load RDF into TDB triplestore
• Duplicate URIs are not loaded
• Further operations are made in the triple store

Scoring/Match
• Disambiguates People, Organizations, etc. based
upon property values
• Supports Equality, NameCompare,
NormalizedLevenshteinDifference, Soundex
algorithms
• Each property is weighted
▫ firstName: 0.5
▫ lastName: 0.5
▫ Email: 1.0
• MatchThreshHold: 1.0

Matching
• Determines what should be done with a record
which matches another record based upon it’s
“score”
▫ Replace old record
▫ Merge records
▫ Ignore record

ChangeNameSpace
• Match old namespace pattern in configuration file
http://vivo.example.com/harvest/aims_users/person/
• Specify namespace in VIVO
http://agrivivodev.mannlib.cornell.edu/vivo/individual/
• Mint a new URI in the vivo namespace
http://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456

Diff of previous harvest
• Compare TDB model with previous harvest
• Generate vivo-additions.rdf
• Generate vivo-substractions.rdf

Final Transfer
• Load vivo-subtractions.rdf file into SDB
• Load vivo-additions.rdf file into SDB

Data Ingest alternatives
• Karma: an information integration tool which
provides a GUI for modeling data into an ontology
• Google Refine: Good for one time ingests and has a
VIVO RDF plugin
• VIVO admin tools can load RDF

Exposing Data in VIVO
• Vivo web pages
• View data as RDF
• Query a Sparql Endpoint and transform results
• Drupal front end

Visualization
• Completed Work
▫ Co-Author visualization
▫ Sparklines
▫ VIVO world activity map

VIVO 1.0 source code was publicly released on April 14, 2010
87 downloads by June 11, 2010. 917 downloads on July 16, 2o10.
The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,
manage, utilize, and communicate progress in science and technology.
06/2010

Requesting RDF using an Accept Header
• curl -H "Accept: application/rdf+xml" -X GET
http://vivo.ufl.edu/display/n25562

Retrieving data with SPARQL
• Fuseki sparql endpoint installed (not included)
• Callable with a SPARQL Client
• Semantic Services
▫ Manages custom sparql queries
▫ Exposes URL for external sites
▫ Can ask for output as html, xml, json

Hector Abruna on Chemistry Site

Viewing VIVO data with Drupal
• Import data with Feeds module and Linked Data
Importer
• Examples

VivoSearch: search across multiple
vivo sites

Some Links
• Vivoweb
▫ http://vivoweb.org
• Vivoweb on Sourceforge
▫ http://www.sourceforge.net/projects/vivo
• VivoSearch
▫ http://vivosearch.org
• Vivo Wiki on Duraspace
▫ https://wiki.duraspace.org/display/VIVO
• Mailing Lists
▫ http://sourceforge.net/p/vivo/sfx-list/

NISO/DCMI Webinar
Semantic Mashups Across Large, Heterogeneous
Institutions: Experiences from the VIVO Service
NISO/DCMI Webinar • May 22, 2013
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2013/dcmi/vivo

Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

Similar to NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service (20)

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Recently uploaded

Recently uploaded (20)

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

Editor's Notes