#ALAAC15 Linked Data Love

Linked Data Love: research representation, discovery, and assessment


The explosion of linked data platforms and data stores over the last five years has been profound – both in terms of quantity of data as well as its potential impact. Research information systems such as VIVO ( play a significant role in enabling this work. VIVO is an open source, Semantic Web-based application that provides an integrated, searchable view of the scholarly activities of an organization. The uniform semantic structure of VIVO-ISF data enables a new class of tools to advance science. This presentation will provide a brief introduction and update to VIVO and present ways that this semantically-rich data can enable visualizations, reporting and assessment, next-generation collaboration and team building, and enhanced multi-site search. Libraries are uniquely positioned to facilitate the open representation of research information and its subsequent use to spur collaboration, discovery, and assessment. The talk will conclude with a description of ways librarians are engaged in this work – including visioning, metadata and ontology creation, policy creation, data curation and management, technical, and engagement activities.

Kristi Holmes, PhD
Director, Galter Health Sciences Library
Director of Evaluation, NUCATS
Associate Professor, Preventive Medicine-Health and Biomedical Informatics
Northwestern University Feinberg School of Medicine

  1. 1. Linked Data Love: research representation, discovery, and assessment Kristi Holmes, PhD @kristiholmes Linked Library Data Interest Group #alaac15 - June 27, 2015
  2. 2. The Semantic Web: a value proposition  At  its  heart,  the  Seman.c  Web   is  really  about  extending   standard  Web  technologies  to   be9er  deal  with  data  on  the   Web.         If  the  WWW  is  for  people,  the  Seman.c  Web  is  for   machines   George Thomas and Jim Hendler, Data modeled as bidirectional relationships Web-based infrastructure of standards and technologies which allows for a distributable, machine readable description of data that allows for stronger data and smart web application linkages
  3. 3. Let’s talk about the data… The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other related data.
  4. 4. Let’s think about why this is important on the institutional level…
  5. 5. •  Research is increasingly more interdisciplinary •  How can you find collaborators, track competitors, and stay abreast of current research inside large institutions, at other institutions, and globally? •  How can you find others with shared interests or expertise? •  How can you build diverse teams? Find mentors? Be identified as a partner by community groups? Faculty •  Library administration or directors of core facilities want to align their strategic plan with the evolving research needs of their clientele. •  Identifying growth areas of research through increasing publications, focused areas of research and grant dollars enables this task to become more evidence-based. Support: facilities and personnel •  Research institutions can be extremely large and diverse •  How can administrators showcase and monitor research activity, track competitors, and stay abreast of current research inside large institutions, at other institutions, and globally? •  How can you enhance visibility and present a unified picture of an institution? Administrators We face a number of challenges on our campuses!
  6. 6. Research networking can help. Information about scholars is optimized using a Web-based infrastructure of standards and technologies which allows for a distributable, machine readable description of data that allows for stronger data and smart web application linkages across many universities, agencies, societies both within the US and abroad. Why is this important? Linked data infrastructure allows for •  Visualizations, research and clinical data integration, and deep semantic searching across multiple types and sources of data •  By breaking data out of traditional database silos, research networking platforms promote a network effect within a single site and across multiple sites –  The value of the network increases with the amount of linked data and applications that are available to consume the linked data.
  7. 7. 1.  An open source semantic web application 2.  An information model 3.  An open community Let’s talk about research networking in the context of VIVO – what is it?
  8. 8. What is VIVO? 1.  An open source semantic web application 2.  An information model 3.  An open community VIVO is one research networking platform, although there are others. Organizations make decisions about adopting these tools based on many different features. The most important aspect isn’t the software, it is the data! More on that later…
  9. 9. VIVO An open-source semantic web application that enables the discovery of research and scholarship across disciplines in an institution. VIVO harvests data from verified sources and offers detailed profiles of faculty and researchers. Public, structured linked data about investigators interests, activities and accomplishments, and tools to use that data to advance science. VIVO enjoys a robust open community space to support implementation, adoption, &development efforts around the world. See
  10. 10. A VIVO profile allows you to: Showcase credentials, expertise, skills, and professional achievements for individuals and campus groups. Connect within focus areas and geographic expertise. Simplify reporting tasks and link data to external applications – e.g., to generate biosketches or CV or for reporting purposes. Publish the URL or link the profile to other applications. Discover potential colleagues or campus resources by work area, authorship, & collaborations. Display visualizations of expertise areas or complex collaboration networks and relationships.
  11. 11. What is VIVO? 1.  An open source semantic web application 2.  An information model 3.  An open community
  12. 12. CTSA: Recommendations and Best Practices for Research Networking The Research Networking Recommendations were approved by the CTSA Consortium Executive and Steering Committee on October 25, 2011. Recommendations for Research Networking: •  Recommendation: All CTSAs should encourage their institution(s) to implement research networking tool(s) institution-wide that utilize RDF triples and an ontology compatible with the VIVO ontology. •  Recommendation: Information in people profiles at institutions should be publicly available as data as a general principle, specifically as Linked Open Data. To ensure quality of information, authoritative electronic data sources versus manual entry should be emphasized. Institutions will vary in the amount of information that they will include and make publicly available but the value is enhanced by the quality and quantity of information. •  Recommendation: Monitoring of the research networking landscape, technology, and tools should continue to be overseen by experts from the CTSA consortium (e.g., the Research Networking group of the Informatics KFC).
  13. 13. Building a large web of data, greater than any one effort, greater than any one platform. Data Creators, Data Aggregators, & Data Consumers Repositories. Tools. Applications. Workflows
  14. 14. A couple of local examples…
  15. 15. Brown University
  16. 16. Weill Cornell Medical College
  17. 17. WCMC CTSC’s VIVO data sources
  18. 18. Duke University
  19. 19. Data, Tools and Scientists
  20. 20. VIVO search scenarios •  Multiple campuses of one university •  Regional connections -  e.g., Illinois ties with regional federal labs •  Consortia – 62+ CTSAs, USDA plus land grant universities •  International -  13 Netherlands universities and the National Library -  German Universities -  AgriVIVO – UN FAO Searchlight, AgriVIVO, etc.
  21. 21. Concept Coverage •  Research networking systems queried: 57 -  SPARQL endpoints queried: 9 -  Sites crawled: 48 •  Institutions indexed: 64 -  CTSA institutions: 27 •  Total person URIs: 4,933,757 -  Unique individuals profiled: 140,949 - 300,239 •  Total publications by those persons indexed as part of their profile: 8,396,744 •  Total co-author pairs (two people on the same paper): 48,012,993 •  The harvesting times listed below are the times required to interrogate the respective SPARQL endpoints or crawl the respective servers and cache the results locally at Iowa. CTSAsearch
  22. 22. What is VIVO? 1.  An open source semantic web application 2.  An information model 3.  An open community
  23. 23. VIVO Community •  DuraSpace wiki •  Calls and listservs -  Ontology -  Development -  Implementation -  Outreach -  Tools and Apps •  Social Media -  Facebook -  LinkedIn -  Twitter •  Events •  Annual conference •  Implementation Fest •  Workshops •  Hackathons
  24. 24. VIVO Community
  25. 25. VIVO projects around the world
  26. 26. Current  and  future  VIVO  efforts  
  27. 27. VIVOs •  150+ impl. & pilot projects •  35+ countries •  20+ CTSAs Standards •  CTSAconnect Integrated Semantic Framework ontology •  ORCID •  CASRAI •  others Partners •  Symplectic •  euroCRIS •  W3C, DERI, ConceptWeb Alliance, OpenPHACTS •  Institutions/ organizations Events •  VIVO conference Aug. 2015 •  Spring Implementation Fest @ OHSU •  DuraSpace VIVO webinars •  Hackathon Community •  VIVO wiki •  Listservs •  Weekly calls •  GitHub • •  @VIVOcollab VIVO Updates
  28. 28. The changing role of libraries
  29. 29. •  Are a trusted, neutral entity •  Have a tradition of service and support •  Strive to serve all missions of the institution •  Are technology centers and have IT and data expertise •  Have skills—information organization, instruction, usability, subject expertise •  Have close relationships with their clients (buy in) •  Understand user needs •  Understand the importance of collaboration and know how to bring people together •  Have knowledge of institution, research, education, clinical landscape Library Staff: Libraries: What roles can the library play?
  30. 30. What roles can the library play? Librarians are successfully stepping up to the semantic web plate in a variety of roles related to institutional research networking platforms. •  Outreach and adoption activities •  Education and training on the use of the platform •  Ontology and controlled vocabulary expertise, extending the model •  Negotiations with data providers •  Programming, technical support •  Workgroup representation •  …and more! Research networking also provides an opportunity for libraries to become familiar with many concepts around linked open data and the semantic web.
  31. 31. Building an ecosystem for evaluation and continuous improvement
  32. 32. Northwestern University Clinical and Translational Sciences (NUCATS) Institute Mission: Speeding transformative research discoveries to patients and the community
  33. 33. Library as Partner Opportunity! Metrics and Impact Core Digital projects
  34. 34. Digital Projects led by Digital Systems and Collection Services Among other projects… Symplectic Elements -  Back-end bibliometric aggregator -  Support OA with repository integration -  Facilitates reports and reuse of clean aggregated data from a number of diverse sources Digital repository -  We’ll gain the ability to create, share, and preserve attractive, functional, and citable digital collections and exhibits -  Promotes discovery and access of FSM scholarship, both traditional and alternative outputs -  Better metrics 35
  35. 35. Symplectic Elements Tracking, evaluation, and reporting Digital Asset Management System (IR) Tasks (CVs and biosketches, etc.) Research Information Systems The Symplectic Elements platform & data will help facilitate new avenues of support
  36. 36. 37 Our shop is committed to open source principles and we leverage semantic web languages and architecture whenever possible to support open science. We want to optimize discoverability and dissemination of content and enhance the impact of FSM, NUCATS, and our Northwestern Medicine community.
  37. 37. •  Measurement  instruments   •  Con4nuing  educa4on  materials   •  Cost-­‐effec4ve  interven4on   •  Consensus  development  conferences     •  American  Medical  Associa4on  Current   Procedural  Terminology  (CPT)  codes   •  Change  in  delivery  of  healthcare  services   •  Gray  literature   Going beyond the counts to find evidence of meaningful impact •  New  experimental  methods,  databases  or   soHware  tools   •  New  diagnos4c  criteria  or  standards  of  care   •  Biologics   •  Curriculum  guidelines   •  Clinical/prac4ce  guidelines   •  Quality  measure  guidelines Pathways Advancement of Knowledge Clinical Implementation Legislation and Policy Enactment Economic Benefit Community Benefit
  38. 38. 39 Bringing scholarship out into the open Enhancing discovery. Enhancing impact.
  39. 39. Hope to see you at the conference in August!
  40. 40. Acknowledgements Teams: • The amazing team at Galter Library • VIVO Colleagues worldwide Support: •  Northwestern University Clinical and Translational Sciences Institute, NIH award UL1TR000150 •  VIVO, NIH award U24 RR029822 •  VIVO/DuraSpace Questions/Follow-up: • •  Twitter: @kristiholmes
  41. 41. Thank you! Kristi Holmes @kristiholmes
