Presiding Officer Training module 2024 lok sabha elections
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
1. NISO/DCMI Webinar:
Semantic Mashups Across
Large, Heterogeneous Institutions:
Experiences from the VIVO Service
May 22, 2013
Speaker:
John Fereira,
Senior Programmer/Analyst and
Technology Strategist at Cornell University
http://www.niso.org/news/events/2013/dcmi/vivo
2. Semantic mashups across
large, heterogeneous
institutions: experiences
from the VIVO service
John Fereira
Cornell University
3. Overview
• What is VIVO?
• History of VIVO
• High level Overview
• Ingesting Data into VIVO
• Exposing Data in Vivo
4. What is VIVO?
• VIVO is not an acronym
• A semantic web application that enables the discovery of
research and scholarship across disciplines in an
institution.
• VIVO enables collaboration and understanding across an
institution and among institutions – and not just for
scientists.
• A powerful search/browse functionality for locating people
and information within or across institutions.
5. What is VIVO?
• An ontology editor. Vivo includes a “vivo” ontology
with can be modified and extended
• An instance editor. Instances of classes such as a
Person, Organization, Event, etc. can be created,
modified, and deleted
• Content can also be brought into VIVO in automated
ways from local systems of record, such as HR,
grants, course, and faculty activity databases, or
from database providers such as publication
aggregators and funding agencies.
6. What is VIVO?
• VIVO is a content disseminator
• Views of People, Organizations, etc. can be highly
customized
• VIVO provides visualizations such as topic maps, co-
authorship networks
• Open data means other applications can use it
7. A brief History of VIVO
• 2003 – Vivo created for local use at Cornell University
for life sciences collaboration
• 2007 - Reimplemented using RDF, OWL, Jena and
SPARQL
• 2007 – Implemented at Cornell and University of
Florida as “production” systems
8. A brief History of VIVO
• 2009 - seven institutions received $12.2 million in
funding from the National Center for Research
Resources of the NIH to enable a national network of
scientists
• 2010 – Version 1.0 released as open source
• 2013 – Now at version 1.5.1
• 2013 – Transitioning from funded project to a
sustainable community open source project
9. A high level Overview
• Core ideas
• Searching/browsing
• Self editing
10. Core ideas
• Research and researchers should be discoverable
independently of administrative hierarchies
• Relationships are as interesting as the facts
• It’s the network, not just the nodes
• Static data models are too confining
• Granular data management allows multiple views and
re-purposing
• Discovery is improved by linking pages to surrounding
context
11. VIVO and Linked Open Data
• VIVO enables authoritative data about researchers to become
part of the Linked Open Data (LOD) cloud
Tim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
12. Linked Data principles
Tim Berners-Lee:
▫ Use URIs as names for things
▫ Use HTTP URIs so that people can look up those names
▫ When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
▫ Include links to other URIs so that people can discover
more things
http://linkeddata.org
14. Searching and Browsing
• Triple store indexed into a SOLR instance
• Searches are against SOLR
• Instance data comes from triplestore
• An example…
25. VIVO harvests much of its data automatically from
verified sources
•Reduces the need for manual input of data
•Provides an integrated and flexible source of publicly
visible data at an institutional level
Data, data, data
Individuals may also edit and customize their profiles to
suit their professional needs
External data
sources
Internal data
sources
26. Ingesting data with the Vivo Harvester
• A pipeline of tools
• Tools are written java, using Jena APIs
• Can fetch data from a variety of data formats
• Data can be sanitized and disambiguated
• Data is ingested directly to the triple store…does not
require VIVO web app to be running
27. Harvesting Pipeline
• Fetcher/Parser
• Translate: maps rdf to “vivo” RDF
• Transfer to local triple store (Jena TDB)
• Disambiguate using Scoring/Matching
• Changenamespace (mint unique URIs)
• Diff with previous model to create subtractions
• Transfer to VIVO triple store
28. Fetching and Parsing
• Fetches data from a URL, Database, local file
• Many different types of fetchers
▫ CSV fetcher
▫ JDBC fetcher
▫ SimpleXMLFetcher
▫ JSONFetcher
• Output is intermediate RDF Format, one file per
record
• “Fake” namespace used
29. <?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/"
xml:base="http://vivo.example.com/harvest/aims_users/person">
<rdf:Description rdf:ID="node_-_0">
<rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/>
<node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture>
<node-person:Website>http://www.valeriapesce.name</node-person:Website>
<node-person:Nid>108074</node-person:Nid>
<node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively on
metad
ata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCS
group in
FAO.</node-person:Profile>
<node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization>
<node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise>
<node-person:LastName>Pesce</node-person:LastName>
<node-person:Country>Italy</node-person:Country>
<node-person:Email>valeria.pesce@fao.org</node-person:Email>
<node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation>
<node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL>
<node-person:Username>valeria.pesce</node-person:Username>
<node-person:FirstName>Valeria</node-person:FirstName>
<node-person:Role>Information Management Specialist</node-person:Role>
<node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD Content
Management
Task Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - International
Association of
Agricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data
- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests>
</rdf:Description>
</rdf:RDF>
30. Translate
• Map “fake” namespace to VIVO classes and
properties
• Uses XSLT transform
• Unique ID for each record
• node-person:Organization becomes
foaf:Organization
• Relationships created
31. Translated RDF
<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdfs:label>Pesce, Valeria</rdfs:label>
<core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/>
<foaf:firstName>Valeria</foaf:firstName>
<foaf:lastName>Pesce</foaf:lastName>
<core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail>
<core:positionInOrganization
rdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20
United%20Nations%20(FAO)"/>
</rdf:Description>
<rdf:Description
rdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20Uni
ted%20Nations%20(FAO)">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
<rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label>
<core:organizationForPosition
rdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organ
ization%20of%20the%20United%20Nations%20(FAO)"/>
<core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/>
</rdf:Description>
32. Transfer
• Load RDF into TDB triplestore
• Duplicate URIs are not loaded
• Further operations are made in the triple store
33. Scoring/Match
• Disambiguates People, Organizations, etc. based
upon property values
• Supports Equality, NameCompare,
NormalizedLevenshteinDifference, Soundex
algorithms
• Each property is weighted
▫ firstName: 0.5
▫ lastName: 0.5
▫ Email: 1.0
• MatchThreshHold: 1.0
34. Matching
• Determines what should be done with a record
which matches another record based upon it’s
“score”
▫ Replace old record
▫ Merge records
▫ Ignore record
35. ChangeNameSpace
• Match old namespace pattern in configuration file
http://vivo.example.com/harvest/aims_users/person/
• Specify namespace in VIVO
http://agrivivodev.mannlib.cornell.edu/vivo/individual/
• Mint a new URI in the vivo namespace
http://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
36. Diff of previous harvest
• Compare TDB model with previous harvest
• Generate vivo-additions.rdf
• Generate vivo-substractions.rdf
37. Final Transfer
• Load vivo-subtractions.rdf file into SDB
• Load vivo-additions.rdf file into SDB
38. Data Ingest alternatives
• Karma: an information integration tool which
provides a GUI for modeling data into an ontology
• Google Refine: Good for one time ingests and has a
VIVO RDF plugin
• VIVO admin tools can load RDF
39. Exposing Data in VIVO
• Vivo web pages
• View data as RDF
• Query a Sparql Endpoint and transform results
• Drupal front end
46. VIVO 1.0 source code was publicly released on April 14, 2010
87 downloads by June 11, 2010. 917 downloads on July 16, 2o10.
The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,
manage, utilize, and communicate progress in science and technology.
06/2010
63. NISO/DCMI Webinar
Semantic Mashups Across Large, Heterogeneous
Institutions: Experiences from the VIVO Service
NISO/DCMI Webinar • May 22, 2013
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2013/dcmi/vivo
64. Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU
Editor's Notes
Authoritative data, diverse formats, filter out private informationTalk about verified dataTalking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input.Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative!The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
Co-author visAn at-a-glance view of an individual's collaboration space. Who do they collaborate with most often? Do they always work with the same people, or do they work with multiple separate communities?Links increase in size and color with more frequent collaboration. Co-authors are clustered into communities. Users can explore the social network by traveling to co-authors pages.
Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.