Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Authoritative data, diverse formats, filter out private informationTalk about verified dataTalking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input.Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative!The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
  • Co-author visAn at-a-glance view of an individual's collaboration space. Who do they collaborate with most often? Do they always work with the same people, or do they work with multiple separate communities?Links increase in size and color with more frequent collaboration. Co-authors are clustered into communities. Users can explore the social network by traveling to co-authors pages.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • Transcript

    • 1. NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Speaker:John Fereira,Senior Programmer/Analyst andTechnology Strategist at Cornell University
    • 2. Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University
    • 3. Overview• What is VIVO?• History of VIVO• High level Overview• Ingesting Data into VIVO• Exposing Data in Vivo
    • 4. What is VIVO?• VIVO is not an acronym• A semantic web application that enables the discovery ofresearch and scholarship across disciplines in aninstitution.• VIVO enables collaboration and understanding across aninstitution and among institutions – and not just forscientists.• A powerful search/browse functionality for locating peopleand information within or across institutions.
    • 5. What is VIVO?• An ontology editor. Vivo includes a “vivo” ontologywith can be modified and extended• An instance editor. Instances of classes such as aPerson, Organization, Event, etc. can be created,modified, and deleted• Content can also be brought into VIVO in automatedways from local systems of record, such as HR,grants, course, and faculty activity databases, orfrom database providers such as publicationaggregators and funding agencies.
    • 6. What is VIVO?• VIVO is a content disseminator• Views of People, Organizations, etc. can be highlycustomized• VIVO provides visualizations such as topic maps, co-authorship networks• Open data means other applications can use it
    • 7. A brief History of VIVO• 2003 – Vivo created for local use at Cornell Universityfor life sciences collaboration• 2007 - Reimplemented using RDF, OWL, Jena andSPARQL• 2007 – Implemented at Cornell and University ofFlorida as “production” systems
    • 8. A brief History of VIVO• 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchResources of the NIH to enable a national network ofscientists• 2010 – Version 1.0 released as open source• 2013 – Now at version 1.5.1• 2013 – Transitioning from funded project to asustainable community open source project
    • 9. A high level Overview• Core ideas• Searching/browsing• Self editing
    • 10. Core ideas• Research and researchers should be discoverableindependently of administrative hierarchies• Relationships are as interesting as the facts• It’s the network, not just the nodes• Static data models are too confining• Granular data management allows multiple views andre-purposing• Discovery is improved by linking pages to surroundingcontext
    • 11. VIVO and Linked Open Data• VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) cloudTim Berners-Lee,
    • 12. Linked Data principlesTim Berners-Lee:▫ Use URIs as names for things▫ Use HTTP URIs so that people can look up those names▫ When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL)▫ Include links to other URIs so that people can discovermore things
    • 13. VIVO in the LOD cloud
    • 14. Searching and Browsing• Triple store indexed into a SOLR instance• Searches are against SOLR• Instance data comes from triplestore• An example…
    • 15. Food security
    • 16. Self Editing• Users can edit their own profile• System can delegate editing to “proxy” editors• Some data can be locked• An example
    • 17. Editable and non-editable fields
    • 18. Most text fields support “rich text”
    • 19. External Concepts for “terms”
    • 20. Data Ingest (harvesting)
    • 21. VIVO harvests much of its data automatically fromverified sources•Reduces the need for manual input of data•Provides an integrated and flexible source of publiclyvisible data at an institutional levelData, data, dataIndividuals may also edit and customize their profiles tosuit their professional needsExternal datasourcesInternal datasources
    • 22. Ingesting data with the Vivo Harvester• A pipeline of tools• Tools are written java, using Jena APIs• Can fetch data from a variety of data formats• Data can be sanitized and disambiguated• Data is ingested directly to the triple store…does notrequire VIVO web app to be running
    • 23. Harvesting Pipeline• Fetcher/Parser• Translate: maps rdf to “vivo” RDF• Transfer to local triple store (Jena TDB)• Disambiguate using Scoring/Matching• Changenamespace (mint unique URIs)• Diff with previous model to create subtractions• Transfer to VIVO triple store
    • 24. Fetching and Parsing• Fetches data from a URL, Database, local file• Many different types of fetchers▫ CSV fetcher▫ JDBC fetcher▫ SimpleXMLFetcher▫ JSONFetcher• Output is intermediate RDF Format, one file perrecord• “Fake” namespace used
    • 25. <?xml version="1.0"?><rdf:RDF xmlns:rdf=""xmlns:node-person=""xml:base=""><rdf:Description rdf:ID="node_-_0"><rdf:type rdf:resource=""/><node-person:Picture></node-person:Picture><node-person:Website></node-person:Website><node-person:Nid>108074</node-person:Nid><node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively onmetadata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCSgroup inFAO.</node-person:Profile><node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization><node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise><node-person:LastName>Pesce</node-person:LastName><node-person:Country>Italy</node-person:Country><node-person:Email></node-person:Email><node-person:geolocation></node-person:geolocation><node-person:Profile_URL></node-person:Profile_URL><node-person:Username>valeria.pesce</node-person:Username><node-person:FirstName>Valeria</node-person:FirstName><node-person:Role>Information Management Specialist</node-person:Role><node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD ContentManagementTask Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - InternationalAssociation ofAgricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests></rdf:Description></rdf:RDF>
    • 26. Translate• Map “fake” namespace to VIVO classes andproperties• Uses XSLT transform• Unique ID for each record• node-person:Organization becomesfoaf:Organization• Relationships created
    • 27. Translated RDF<rdf:Description rdf:about=""><rdf:type rdf:resource=""/><rdfs:label>Pesce, Valeria</rdfs:label><core:currentMemberOf rdf:resource=""/><foaf:firstName>Valeria</foaf:firstName><foaf:lastName>Pesce</foaf:lastName><core:primaryEmail></core:primaryEmail><core:positionInOrganizationrdf:resource=""/></rdf:Description><rdf:Descriptionrdf:about=""><rdf:type rdf:resource=""/><rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label><core:organizationForPositionrdf:resource=""/><core:hasGeographicLocation rdf:resource=""/></rdf:Description>
    • 28. Transfer• Load RDF into TDB triplestore• Duplicate URIs are not loaded• Further operations are made in the triple store
    • 29. Scoring/Match• Disambiguates People, Organizations, etc. basedupon property values• Supports Equality, NameCompare,NormalizedLevenshteinDifference, Soundexalgorithms• Each property is weighted▫ firstName: 0.5▫ lastName: 0.5▫ Email: 1.0• MatchThreshHold: 1.0
    • 30. Matching• Determines what should be done with a recordwhich matches another record based upon it’s“score”▫ Replace old record▫ Merge records▫ Ignore record
    • 31. ChangeNameSpace• Match old namespace pattern in configuration file• Specify namespace in VIVO• Mint a new URI in the vivo namespace
    • 32. Diff of previous harvest• Compare TDB model with previous harvest• Generate vivo-additions.rdf• Generate vivo-substractions.rdf
    • 33. Final Transfer• Load vivo-subtractions.rdf file into SDB• Load vivo-additions.rdf file into SDB
    • 34. Data Ingest alternatives• Karma: an information integration tool whichprovides a GUI for modeling data into an ontology• Google Refine: Good for one time ingests and has aVIVO RDF plugin• VIVO admin tools can load RDF
    • 35. Exposing Data in VIVO• Vivo web pages• View data as RDF• Query a Sparql Endpoint and transform results• Drupal front end
    • 36. Default VIVO theme
    • 37. Cornell VIVO
    • 38. Griffiths University
    • 39. Melbourne Find an Expert
    • 40. Visualization• Completed Work▫ Co-Author visualization▫ Sparklines▫ VIVO world activity map
    • 41. VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10.The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,manage, utilize, and communicate progress in science and technology.06/2010
    • 42. View RDF from profile page
    • 43. Requesting RDF using an Accept Header• curl -H "Accept: application/rdf+xml" -X GET
    • 44. Retrieving data with SPARQL• Fuseki sparql endpoint installed (not included)• Callable with a SPARQL Client• Semantic Services▫ Manages custom sparql queries▫ Exposes URL for external sites▫ Can ask for output as html, xml, json
    • 45. Semantic Services application
    • 46. Hector Abruna in VIVO
    • 47. Hector Abruna on Chemistry Site
    • 48. Viewing VIVO data with Drupal• Import data with Feeds module and Linked DataImporter• Examples
    • 49. Cals Impact Statements
    • 50. Agrivivo Home Page
    • 51. Agrivivo map page
    • 52. AgriVivo
    • 53. VivoSearch: search across multiplevivo sites
    • 54. Vivo SearchLight bookmarklet
    • 55. Vivo Searchlight
    • 56. Some Links• Vivoweb▫• Vivoweb on Sourceforge▫• VivoSearch▫• Vivo Wiki on Duraspace▫• Mailing Lists▫
    • 57. Thank you
    • 58. NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webinar • May 22, 2013Questions?All questions will be posted with presenter answers onthe NISO website following the webinar:
    • 59. Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from you!THANK YOU