NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Sp...
Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University
Overview• What is VIVO?• History of VIVO• High level Overview• Ingesting Data into VIVO• Exposing Data in Vivo
What is VIVO?• VIVO is not an acronym• A semantic web application that enables the discovery ofresearch and scholarship ac...
What is VIVO?• An ontology editor. Vivo includes a “vivo” ontologywith can be modified and extended• An instance editor. I...
What is VIVO?• VIVO is a content disseminator• Views of People, Organizations, etc. can be highlycustomized• VIVO provides...
A brief History of VIVO• 2003 – Vivo created for local use at Cornell Universityfor life sciences collaboration• 2007 - Re...
A brief History of VIVO• 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchR...
A high level Overview• Core ideas• Searching/browsing• Self editing
Core ideas• Research and researchers should be discoverableindependently of administrative hierarchies• Relationships are ...
VIVO and Linked Open Data• VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) c...
Linked Data principlesTim Berners-Lee:▫ Use URIs as names for things▫ Use HTTP URIs so that people can look up those names...
VIVO in the LOD cloud
Searching and Browsing• Triple store indexed into a SOLR instance• Searches are against SOLR• Instance data comes from tri...
Food security
Self Editing• Users can edit their own profile• System can delegate editing to “proxy” editors• Some data can be locked• A...
Editable and non-editable fields
Most text fields support “rich text”
External Concepts for “terms”
Data Ingest (harvesting)
VIVO harvests much of its data automatically fromverified sources•Reduces the need for manual input of data•Provides an in...
Ingesting data with the Vivo Harvester• A pipeline of tools• Tools are written java, using Jena APIs• Can fetch data from ...
Harvesting Pipeline• Fetcher/Parser• Translate: maps rdf to “vivo” RDF• Transfer to local triple store (Jena TDB)• Disambi...
Fetching and Parsing• Fetches data from a URL, Database, local file• Many different types of fetchers▫ CSV fetcher▫ JDBC f...
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:node-person="http://vivo.exampl...
Translate• Map “fake” namespace to VIVO classes andproperties• Uses XSLT transform• Unique ID for each record• node-person...
Translated RDF<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"><rdf:type rdf:reso...
Transfer• Load RDF into TDB triplestore• Duplicate URIs are not loaded• Further operations are made in the triple store
Scoring/Match• Disambiguates People, Organizations, etc. basedupon property values• Supports Equality, NameCompare,Normali...
Matching• Determines what should be done with a recordwhich matches another record based upon it’s“score”▫ Replace old rec...
ChangeNameSpace• Match old namespace pattern in configuration filehttp://vivo.example.com/harvest/aims_users/person/• Spec...
Diff of previous harvest• Compare TDB model with previous harvest• Generate vivo-additions.rdf• Generate vivo-substraction...
Final Transfer• Load vivo-subtractions.rdf file into SDB• Load vivo-additions.rdf file into SDB
Data Ingest alternatives• Karma: an information integration tool whichprovides a GUI for modeling data into an ontology• G...
Exposing Data in VIVO• Vivo web pages• View data as RDF• Query a Sparql Endpoint and transform results• Drupal front end
Default VIVO theme
Cornell VIVO
Griffiths University
Melbourne Find an Expert
Visualization• Completed Work▫ Co-Author visualization▫ Sparklines▫ VIVO world activity map
VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10....
View RDF from profile page
Requesting RDF using an Accept Header• curl -H "Accept: application/rdf+xml" -X GEThttp://vivo.ufl.edu/display/n25562
Retrieving data with SPARQL• Fuseki sparql endpoint installed (not included)• Callable with a SPARQL Client• Semantic Serv...
Semantic Services application
Hector Abruna in VIVO
Hector Abruna on Chemistry Site
Viewing VIVO data with Drupal• Import data with Feeds module and Linked DataImporter• Examples
Cals Impact Statements
Agrivivo Home Page
Agrivivo map page
AgriVivo
VivoSearch: search across multiplevivo sites
Vivo SearchLight bookmarklet
Vivo Searchlight
Some Links• Vivoweb▫ http://vivoweb.org• Vivoweb on Sourceforge▫ http://www.sourceforge.net/projects/vivo• VivoSearch▫ htt...
Thank you
NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webi...
Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from yo...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
Upcoming SlideShare
Loading in...5
×

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

27,225

Published on

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
27,225
On Slideshare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
22
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Authoritative data, diverse formats, filter out private informationTalk about verified dataTalking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input.Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative!The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
  • Co-author visAn at-a-glance view of an individual&apos;s collaboration space. Who do they collaborate with most often? Do they always work with the same people, or do they work with multiple separate communities?Links increase in size and color with more frequent collaboration. Co-authors are clustered into communities. Users can explore the social network by traveling to co-authors pages.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

    1. 1. NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Speaker:John Fereira,Senior Programmer/Analyst andTechnology Strategist at Cornell Universityhttp://www.niso.org/news/events/2013/dcmi/vivo
    2. 2. Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University
    3. 3. Overview• What is VIVO?• History of VIVO• High level Overview• Ingesting Data into VIVO• Exposing Data in Vivo
    4. 4. What is VIVO?• VIVO is not an acronym• A semantic web application that enables the discovery ofresearch and scholarship across disciplines in aninstitution.• VIVO enables collaboration and understanding across aninstitution and among institutions – and not just forscientists.• A powerful search/browse functionality for locating peopleand information within or across institutions.
    5. 5. What is VIVO?• An ontology editor. Vivo includes a “vivo” ontologywith can be modified and extended• An instance editor. Instances of classes such as aPerson, Organization, Event, etc. can be created,modified, and deleted• Content can also be brought into VIVO in automatedways from local systems of record, such as HR,grants, course, and faculty activity databases, orfrom database providers such as publicationaggregators and funding agencies.
    6. 6. What is VIVO?• VIVO is a content disseminator• Views of People, Organizations, etc. can be highlycustomized• VIVO provides visualizations such as topic maps, co-authorship networks• Open data means other applications can use it
    7. 7. A brief History of VIVO• 2003 – Vivo created for local use at Cornell Universityfor life sciences collaboration• 2007 - Reimplemented using RDF, OWL, Jena andSPARQL• 2007 – Implemented at Cornell and University ofFlorida as “production” systems
    8. 8. A brief History of VIVO• 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchResources of the NIH to enable a national network ofscientists• 2010 – Version 1.0 released as open source• 2013 – Now at version 1.5.1• 2013 – Transitioning from funded project to asustainable community open source project
    9. 9. A high level Overview• Core ideas• Searching/browsing• Self editing
    10. 10. Core ideas• Research and researchers should be discoverableindependently of administrative hierarchies• Relationships are as interesting as the facts• It’s the network, not just the nodes• Static data models are too confining• Granular data management allows multiple views andre-purposing• Discovery is improved by linking pages to surroundingcontext
    11. 11. VIVO and Linked Open Data• VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) cloudTim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
    12. 12. Linked Data principlesTim Berners-Lee:▫ Use URIs as names for things▫ Use HTTP URIs so that people can look up those names▫ When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL)▫ Include links to other URIs so that people can discovermore thingshttp://linkeddata.org
    13. 13. VIVO in the LOD cloud
    14. 14. Searching and Browsing• Triple store indexed into a SOLR instance• Searches are against SOLR• Instance data comes from triplestore• An example…
    15. 15. Food security
    16. 16. Self Editing• Users can edit their own profile• System can delegate editing to “proxy” editors• Some data can be locked• An example
    17. 17. Editable and non-editable fields
    18. 18. Most text fields support “rich text”
    19. 19. External Concepts for “terms”
    20. 20. Data Ingest (harvesting)
    21. 21. VIVO harvests much of its data automatically fromverified sources•Reduces the need for manual input of data•Provides an integrated and flexible source of publiclyvisible data at an institutional levelData, data, dataIndividuals may also edit and customize their profiles tosuit their professional needsExternal datasourcesInternal datasources
    22. 22. Ingesting data with the Vivo Harvester• A pipeline of tools• Tools are written java, using Jena APIs• Can fetch data from a variety of data formats• Data can be sanitized and disambiguated• Data is ingested directly to the triple store…does notrequire VIVO web app to be running
    23. 23. Harvesting Pipeline• Fetcher/Parser• Translate: maps rdf to “vivo” RDF• Transfer to local triple store (Jena TDB)• Disambiguate using Scoring/Matching• Changenamespace (mint unique URIs)• Diff with previous model to create subtractions• Transfer to VIVO triple store
    24. 24. Fetching and Parsing• Fetches data from a URL, Database, local file• Many different types of fetchers▫ CSV fetcher▫ JDBC fetcher▫ SimpleXMLFetcher▫ JSONFetcher• Output is intermediate RDF Format, one file perrecord• “Fake” namespace used
    25. 25. <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/"xml:base="http://vivo.example.com/harvest/aims_users/person"><rdf:Description rdf:ID="node_-_0"><rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/><node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture><node-person:Website>http://www.valeriapesce.name</node-person:Website><node-person:Nid>108074</node-person:Nid><node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively onmetadata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCSgroup inFAO.</node-person:Profile><node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization><node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise><node-person:LastName>Pesce</node-person:LastName><node-person:Country>Italy</node-person:Country><node-person:Email>valeria.pesce@fao.org</node-person:Email><node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation><node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL><node-person:Username>valeria.pesce</node-person:Username><node-person:FirstName>Valeria</node-person:FirstName><node-person:Role>Information Management Specialist</node-person:Role><node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD ContentManagementTask Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - InternationalAssociation ofAgricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests></rdf:Description></rdf:RDF>
    26. 26. Translate• Map “fake” namespace to VIVO classes andproperties• Uses XSLT transform• Unique ID for each record• node-person:Organization becomesfoaf:Organization• Relationships created
    27. 27. Translated RDF<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"><rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/><rdfs:label>Pesce, Valeria</rdfs:label><core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/><foaf:firstName>Valeria</foaf:firstName><foaf:lastName>Pesce</foaf:lastName><core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail><core:positionInOrganizationrdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"/></rdf:Description><rdf:Descriptionrdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"><rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/><rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label><core:organizationForPositionrdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"/><core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/></rdf:Description>
    28. 28. Transfer• Load RDF into TDB triplestore• Duplicate URIs are not loaded• Further operations are made in the triple store
    29. 29. Scoring/Match• Disambiguates People, Organizations, etc. basedupon property values• Supports Equality, NameCompare,NormalizedLevenshteinDifference, Soundexalgorithms• Each property is weighted▫ firstName: 0.5▫ lastName: 0.5▫ Email: 1.0• MatchThreshHold: 1.0
    30. 30. Matching• Determines what should be done with a recordwhich matches another record based upon it’s“score”▫ Replace old record▫ Merge records▫ Ignore record
    31. 31. ChangeNameSpace• Match old namespace pattern in configuration filehttp://vivo.example.com/harvest/aims_users/person/• Specify namespace in VIVOhttp://agrivivodev.mannlib.cornell.edu/vivo/individual/• Mint a new URI in the vivo namespacehttp://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
    32. 32. Diff of previous harvest• Compare TDB model with previous harvest• Generate vivo-additions.rdf• Generate vivo-substractions.rdf
    33. 33. Final Transfer• Load vivo-subtractions.rdf file into SDB• Load vivo-additions.rdf file into SDB
    34. 34. Data Ingest alternatives• Karma: an information integration tool whichprovides a GUI for modeling data into an ontology• Google Refine: Good for one time ingests and has aVIVO RDF plugin• VIVO admin tools can load RDF
    35. 35. Exposing Data in VIVO• Vivo web pages• View data as RDF• Query a Sparql Endpoint and transform results• Drupal front end
    36. 36. Default VIVO theme
    37. 37. Cornell VIVO
    38. 38. Griffiths University
    39. 39. Melbourne Find an Expert
    40. 40. Visualization• Completed Work▫ Co-Author visualization▫ Sparklines▫ VIVO world activity map
    41. 41. VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10.The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,manage, utilize, and communicate progress in science and technology.06/2010
    42. 42. View RDF from profile page
    43. 43. Requesting RDF using an Accept Header• curl -H "Accept: application/rdf+xml" -X GEThttp://vivo.ufl.edu/display/n25562
    44. 44. Retrieving data with SPARQL• Fuseki sparql endpoint installed (not included)• Callable with a SPARQL Client• Semantic Services▫ Manages custom sparql queries▫ Exposes URL for external sites▫ Can ask for output as html, xml, json
    45. 45. Semantic Services application
    46. 46. Hector Abruna in VIVO
    47. 47. Hector Abruna on Chemistry Site
    48. 48. Viewing VIVO data with Drupal• Import data with Feeds module and Linked DataImporter• Examples
    49. 49. Cals Impact Statements
    50. 50. Agrivivo Home Page
    51. 51. Agrivivo map page
    52. 52. AgriVivo
    53. 53. VivoSearch: search across multiplevivo sites
    54. 54. Vivo SearchLight bookmarklet
    55. 55. Vivo Searchlight
    56. 56. Some Links• Vivoweb▫ http://vivoweb.org• Vivoweb on Sourceforge▫ http://www.sourceforge.net/projects/vivo• VivoSearch▫ http://vivosearch.org• Vivo Wiki on Duraspace▫ https://wiki.duraspace.org/display/VIVO• Mailing Lists▫ http://sourceforge.net/p/vivo/sfx-list/
    57. 57. Thank you
    58. 58. NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webinar • May 22, 2013Questions?All questions will be posted with presenter answers onthe NISO website following the webinar:http://www.niso.org/news/events/2013/dcmi/vivo
    59. 59. Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from you!THANK YOU
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×