VIVO at the
University of
Idaho
SHINY HAPPY PEOPLE HOLDING
NODES: USING VIVO (A
SEMANTIC WEB APPLICATION)
TO REVEAL UNIVER...
What is VIVO?


An Open-Source …




Semantic Web application …




RDF (Resource Description Framework) Triples, whi...
Early History of VIVO



1997-2005: VIVO Network idea developed at Cornell
for life and social sciences.


Intended to p...
VIVO at the University of Idaho



Spring 2012 – Fall 2012


Approached by Idaho INBRE (a Biomedical Researcher
network ...
VIVO at the University of Idaho



Spring 2013


Began to pursue expanded VIVO



Receive approval from institutional I...
VIVO at the University of Idaho



Summer 2013


VP approved expanded use of VIVO for Research
Groups on campus and fund...
VIVO at the University of Idaho



Fall 2013


Presented VIVO publicly on campus for first time



VIVO goes live (acce...
VIVO at the University of Idaho



VIVO Today


Beginning to explore VIVO as front-end for historical
documents



Addi...
Hosting



Provided by the Northwest Knowledge Network


www.northwestknowledge.net



NKN focuses on providing technic...
Technical Specs



Our installation



Apache Web Server



MySQL





Red Hat Linux

Tomcat

Current Version of VIV...
Building VIVO – Two Approaches



Approach #1 – the high-resource approach (ideal)


Requires



Available programmers...
Building VIVO – Two Approaches



Approach #2 – the low-resource approach (practical)


Requires



Experimental minds...
Implementation Goals


Start with low-hanging fruit. It is easier to collect



When considering custom tools and proces...
Data Ingestion - General
Typical workflow:
1. Receive data in source format
2. Convert to RDF (usually RDF/XML or Turtle)
...
Data Ingestion - Sources



Public Sources





NSF, NIH, USDA Awards
Pubmed

Commercial Sources





Web of Scienc...
Data Ingestion - Tools


VIVO Harvester




Extract, Transform, and Load (ETL) tool that takes data from
a source and l...
Ontology Extensions



Custom University of Idaho model prefixed with
“uidaho:”



Goals with our extensions



Establ...
Data Re-use - Fuseki


Apache Jena - Fuseki project




jena.apache.org/documentation/serving_data/

Enables external a...
Data Re-use - Fuseki
Example 1:
A very simple way to
look at awards data.
This presents the number
of awards by agency. It...
Data Re-use - Fuseki
Example 2:
An other simple view
using sg-vizler. This
shows a comparison of
two variables – awards
an...
Data Re-use - Fuseki
Example 3:
An other simple example
of data re-use using a
javascript/ajax technique
to display a list...
VIVO as
Institutional
Repository
Background



When Annie was brought on for Scholarly
Communications, one of her tasks was to develop
an IR for the UI.

...
‘Institutional repositories’
“A set of services that a university offers to the
members of its community for the managemen...
‘Institutional repositories’



Are:



Collection of scholarly work



Both cumulative and perpetual





Instituti...
Challenges



Copyright issues, varying access



Buy-in from faculty, voluntary submissions



Getting people to care
VIVO as IR?



Not your typical IR interface



Interconnectedness in a large network



Includes diverse materials, n...
Theory vs. Practice



Although VIVO can act as a front end, the
documents must be hosted elsewhere



We deposit our do...
Theory vs. Practice



We wanted to close this presentation by asking
some questions to the group. If you have any
advice...
Thank you!
VIVO at the University of Idaho
VIVO at the University of Idaho
Upcoming SlideShare
Loading in …5
×

VIVO at the University of Idaho

511 views
387 views

Published on

In 2012, the University of Idaho Library began implementing VIVO, an open-source Semantic Web application, both as a discovery layer for its fledgling institutional repository and as a database to describe, visualize, and report university research activity. The presenters will detail some of the challenges they encountered developing this resource, while discussing the tools and techniques they used for obtaining, editing, and uploading institutional data into the RDF-based VIVO system.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
511
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Example 1:A very simple way to look at awards data. This presents the number of awards by agency. It is using a javascript library called sgvizler to turn JSON data from Fuseki into a Google Charts visualization.
  • Example 2:An other simple view using sg-vizler. This shows a comparison of two variables – awards and publications – for personnel in a specific research group. It would need work as a formal graph, but it points to the way that the data can be re-used.
  • Example 3:An other simple example of data re-use using a javascript/ajax technique to display a list of journal titles and faculty within a specific research group. Links to the faculty members’ VIVO profiles are associated with their names.
  • VIVO at the University of Idaho

    1. 1. VIVO at the University of Idaho SHINY HAPPY PEOPLE HOLDING NODES: USING VIVO (A SEMANTIC WEB APPLICATION) TO REVEAL UNIVERSITY OF IDAHO RESEARCH AND RESEARCHERS
    2. 2. What is VIVO?  An Open-Source …   Semantic Web application …   RDF (Resource Description Framework) Triples, which are controlled subject-predicate-object expressions that produce consistent relationships and Data Harvesting procedures   Data structured so that it can be shared and reused using Linked Data practices and standards…   Freely available with a community of librarians and web developers Collecting, ingesting and publishing (public/private) data in batches to create a searchable, browseable, and reusable network of information on research and researchers.
    3. 3. Early History of VIVO  1997-2005: VIVO Network idea developed at Cornell for life and social sciences.  Intended to provide a view of sciences and research “across disciplinary and administrative boundaries.”  2005: Released for Life Sciences  2007: Expanded to all of Cornell University (thru Library)  2009: $12.2 million NIH grant provided to develop a national version with several other partners  2010 – Present: More and more institutions adopting and developing VIVO instances from “VIVO: Enabling National Networking of Scientists”
    4. 4. VIVO at the University of Idaho  Spring 2012 – Fall 2012  Approached by Idaho INBRE (a Biomedical Researcher network in Idaho) with question about possibly installing VIVO instance  Installed VIVO, began setting up and learning the system, while gathering feedback from INBRE and other stakeholders  Garnered approval from INBRE faculty to publish their information in the system  Harvested INBRE related information from public resources: PubMed and NIH and NSF grants database
    5. 5. VIVO at the University of Idaho  Spring 2013  Began to pursue expanded VIVO  Receive approval from institutional IT evaluation group to go forward  Re-branded instance  Presented VIVO to library faculty and administration as possible project going forward  Presented instance and proposal for new position to VP of Research
    6. 6. VIVO at the University of Idaho  Summer 2013  VP approved expanded use of VIVO for Research Groups on campus and funding for position  Annie Gaines begins as Scholarly Communication Librarian  Ingest, Ingest, Ingest,  Added three additional research groups, as well as the Law School, and associated faculty  Added thousands of grants, publications, and people into the system.
    7. 7. VIVO at the University of Idaho  Fall 2013  Presented VIVO publicly on campus for first time  VIVO goes live (accessible from off campus)  Additional organizational descriptions added (Department, College, Grant Strucutures, etc.)  Gained approval and access to use campus database system, Banner
    8. 8. VIVO at the University of Idaho  VIVO Today  Beginning to explore VIVO as front-end for historical documents  Adding all University Faculty  Creating applications and access points for data  Cleaning, always cleaning …  Using this presentation as a prompt for further development of application, as well as further defining:  the system’s presentation  our data’s preservation  and our mission and goals in using the system
    9. 9. Hosting  Provided by the Northwest Knowledge Network  www.northwestknowledge.net  NKN focuses on providing technical support to researchers  Division of UI’s Office of Research  Strong relationship with the UI Library (they are in the building)  Data is replicated to a data center at Idaho National Laboratory  Present future opportunities for integrating VIVO’s information with other research-related tools/systems
    10. 10. Technical Specs  Our installation   Apache Web Server  MySQL   Red Hat Linux Tomcat Current Version of VIVO  1.5.2  Probably upgrade to 1.6 in March 2014
    11. 11. Building VIVO – Two Approaches  Approach #1 – the high-resource approach (ideal)  Requires   Available programmers and developers   Discrete IT department Formal IT project management Advantages   Advanced customization and configuration   High-level of integration into existing systems/services Reasonably short time from inception to production Disadvantages  Red-tape  Represents a large commitment by the unit
    12. 12. Building VIVO – Two Approaches  Approach #2 – the low-resource approach (practical)  Requires   Experimental mindset   Minimum recommended staff identified in the VIVO implementation guide View VIVO as a series of small projects, rather than one large integration into university activities Advantages    Simple Manageable Disadvantages  Time (takes much longer)  Integration with existing services  Creation of custom data ingest tools
    13. 13. Implementation Goals  Start with low-hanging fruit. It is easier to collect  When considering custom tools and processes, our priorities:  1 – re-use from community or locally  2 – buy if possible  3 – build as needed  Build institutional interest in the existing data before soliciting more resources to further our development  Investigate third-party solutions (Symplectic Elements) as alternatives to custom-building internal methods of collecting data
    14. 14. Data Ingestion - General Typical workflow: 1. Receive data in source format 2. Convert to RDF (usually RDF/XML or Turtle) 3. Associate with VIVO ontology (as needed) 4. Reconcile against existing database 5. Load into the application 6. Re-index if needed
    15. 15. Data Ingestion - Sources  Public Sources    NSF, NIH, USDA Awards Pubmed Commercial Sources    Web of Science Must remove “intellectual effort” CVs, Publication Lists   Must have some means of soliciting them Local Databases (central university, research groups)  Several institutional sources  Must work through the gatekeepers of each  Need data security review to ensure that institutional concerns are met before public exposure
    16. 16. Data Ingestion - Tools  VIVO Harvester   Extract, Transform, and Load (ETL) tool that takes data from a source and loads it into VIVO automatically OpenRefine   Very flexible for different datatypes  Extension enables export in RDF format   Data cleaning tool Reconciliation service allows us to match and deduplicate entries before export Custom Conversion Tools (in Python)  Used for CRIS reports output, as well as other consistent, but unusual formats
    17. 17. Ontology Extensions  Custom University of Idaho model prefixed with “uidaho:”  Goals with our extensions   Establish the local need before creating   Re-use as much as possible Always associate classes within the VIVO hierarchy so that data is not fully reliant on uidaho for context Examples  Members of Idaho EPSCoR, Idaho INBRE, REACCH-PNA  Non-UI/Courtesy Faculty
    18. 18. Data Re-use - Fuseki  Apache Jena - Fuseki project   jena.apache.org/documentation/serving_data/ Enables external access to VIVO data  Without Fuseki, data re-use is limited to those authenticated with the system  Created examples of data re-use to assist in marketing efforts  Goal: to establish value-addness of putting data in VIVO  Example: Labs who need to report the results of their research by creating publication lists, or displaying spatial, temporal, or conceptual aspects of UI research to stakeholders or students could use this feature
    19. 19. Data Re-use - Fuseki Example 1: A very simple way to look at awards data. This presents the number of awards by agency. It is using a javascript library called sgvizler to turn JSON data from Fuseki into a Google Charts visualization.
    20. 20. Data Re-use - Fuseki Example 2: An other simple view using sg-vizler. This shows a comparison of two variables – awards and publications – for personnel in a specific research group. It would need work as a formal graph, but it points to the way that the data can be reused.
    21. 21. Data Re-use - Fuseki Example 3: An other simple example of data re-use using a javascript/ajax technique to display a list of journal titles and faculty within a specific research group. Links to the faculty members’ VIVO profiles are associated with their names.
    22. 22. VIVO as Institutional Repository
    23. 23. Background  When Annie was brought on for Scholarly Communications, one of her tasks was to develop an IR for the UI.  Some potential platforms to use for UI IR:  CONTENTdm – too flat  Bepress – too expensive  VIVO?
    24. 24. ‘Institutional repositories’ “A set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” Clifford Lynch, ARL Bimonthly Report 226, Feb. 2003. “Digital collections that capture and preserve the intellectual output of university communities.” Ryam Crowe, Case for Institutional Repositories, SPARC, 2002
    25. 25. ‘Institutional repositories’  Are:   Collection of scholarly work  Both cumulative and perpetual   Institutionally defined and managed Open Provide:  Long term preservation  Wide dissemination  Showcase for scholars and the institution
    26. 26. Challenges  Copyright issues, varying access  Buy-in from faculty, voluntary submissions  Getting people to care
    27. 27. VIVO as IR?  Not your typical IR interface   Interconnectedness in a large network  Includes diverse materials, not just article pre-prints  Includes citations for all works, not just the ones hosted in the IR   Dynamic browsing and searching Linked data format allows for reuse of data for a variety of purposes The following page shows a theses document in VIVO
    28. 28. Theory vs. Practice  Although VIVO can act as a front end, the documents must be hosted elsewhere  We deposit our docs in CONTENTdm and link to the PDF in VIVO  This makes things easier, but also more complicated  See example of the same theses document in CONTENTdm on the next page
    29. 29. Theory vs. Practice  We wanted to close this presentation by asking some questions to the group. If you have any advice for us on this project we would love to hear from you!  Are more access points better or more confusing?  Should we include historical documents in the VIVO IR?  Which page should be the main collection?  Should we provide links to all collections? Or link from one into the other?  What are best practices with unusually constructed Irs?
    30. 30. Thank you!

    ×