Short paper published in IEEE Visualizations in Practice workshop. Phoenix, AZ.
A new project of CUL is Scholars@Cornell, a data and visualization service built upon VIVO’s semantic, linked data knowledge-base that represents the record of scholarship produced by Cornell faculty and researchers. While adhering to the VIVO ontology, our work on Scholars@Cornell helps move VIVO forward in the technology areas that require a looser coupling of backend and frontend technologies. One key question we set out to answer was “how can visual mediation help users navigate the rich semantic data that represent the scholarship data recorded in VIVO knowledge-base?” Can visualizations be used to make the content more consumable and answer the questions that cannot easily be answered by browsing list views.
Scholars@Cornell: Visualizing the Scholarship Data
1. Scholars@Cornell: Visualizing the Scholarship Data
Muhammad Javed*
Cornell University Library
Sandy Payette†
Cornell University Library
ABSTRACT
Cornell University Library (CUL) has consistently leveraged new
and emerging technologies to improve access and discovery to schol-
arly resources and to preserve and archive them for future genera-
tions. As stewards of the scholarship data, one of our goals is to
record the research being done at Cornell, improve the visibility of
this research and enabling discovery of explicit and latent patterns of
scholarly collaboration. In year 2003, VIVO1 project was initiated
at Cornell University Library to achieve this goal [1]. VIVO is a
web-based software application, now used by more than hundred
institutions around the world and is maintained under the umbrella
of Duraspace2 – a non-profit organization that supports open technol-
ogy projects. As described, “VIVO creates a connected, integrated
record of the scholarly work of an institution, ready for reporting,
visualization, and analysis”. The VIVO ontology3 provides a model
for representing data about research areas and scholarly publications
through structured definitions of persons, publications, and orga-
nizations as well as the semantic relationships among them. The
software provides a default user interface for browsing list-oriented
views of the data, primarily in the form of faculty profiles. While
VIVO-based structured data can be a resource for many types of ap-
plications, the default VIVO software presents data as lists of faculty
profiles. The faculty and publication data is capable of revealing
much more about patterns and dynamics of scholarship. Such data
can support universities in their systems for managing faculty infor-
mation. Additionally, new web applications can be developed to use
this data in dynamic data visualizations that showcase research areas
and expertise. Finally, new types of knowledge portals can be built
to leverage VIVO-modeled data for questions whose answers can
be found in the direct and indirect linkages of data joined together
from different domains of institutional knowledge.
A new project of CUL is Scholars@Cornell4, a data and visual-
ization service built upon VIVO’s semantic, linked data knowledge-
base that represents the record of scholarship produced by Cornell
faculty and researchers. While adhering to the VIVO ontology, our
work on Scholars@Cornell helps move VIVO forward in the tech-
nology areas that require a looser coupling of backend and frontend
technologies. One key question we set out to answer was “how
can visual mediation help users navigate the rich semantic data that
represent the scholarship data recorded in VIVO knowledge-base?”
Can visualizations be used to make the content more consumable
and answer the questions that cannot easily be answered by browsing
list views.
In our work at CUL, the primary entity of interest is scholarship,
of which people and organizations are, by definition, both the cre-
ators and consumers. From this perspective, attention is focused on
aggregate views. Scholars@Cornell provides visualizations-driven
*e-mail: mj495@cornell.edu
†e-mail: sdp6@cornell.edu
1http://vivoweb.org
2http://duraspace.org
3http://vivoweb.org/sites/vivoweb.org/files/vivo-isf-public-1.6.owl
4https://scholars.cornell.edu
Figure 1: Scholars@Cornell: Research and Scholarship across the
university.
aggregate views (see figure 1) of the scholarship data where dy-
namic visualizations become the entry points into a rich graph of
knowledge that can be explored interactively to answer questions
such as: Who are the experts in what areas? Which departments
collaborate with each other? What are patterns of interdisciplinary
research? What are the impact evidences of my research? With
which global organizations we collaborate with, and more. Our goal
is to enable easy discovery of both explicit and latent patterns that
reveal high-impact research areas, patterns of collaboration, and
expertise of faculty and researchers.
Key components of the system are Symplectic Elements5 to pro-
vide automated publication citation feeds from external sources
(such as Web of Science, Scopus, PubMed etc.), the Scholars Feed
Machine that performs automated data curation tasks, Postprocess-
ing Module that queries the knowledge-base and generates the data
for the D3 visualizations and the VIVO semantic linked data store.
The new “VIZ-VIVO” component bridges the chasm between the
back-end of semantically rich data with a front-end user experience
that takes advantage of new developments in the world of dynamic
web visualizations [3]. We demonstrate a set of D3 visualizations
that leverage relationships between people (e.g., faculty), their affili-
ations (e.g., academic departments), and published research outputs
(e.g., journal articles by subject area).
The visualization embedded in Scholars@Cornell are emerged
from our continuous discussions with Cornell academic units,
provost office representatives and other research institutes. One
of their need is to demonstrate the research interests of the faculty.
Scholars@Cornell presents the research interests of the faculty of
a department in the form of a person-to-subject area network (see
figure 2). Different citation indexing applications (e.g. Web of Sci-
ence, PubMed, Scopus) classify the publication venues (e.g. journals
and conferences) in different subject categories. Therefore, all the
articles, published in a selected venue, receive the same category as
applied to the venue. In order to present the research interests of
an author, we made use of the same transitivity approach and link
5http://symplectic.co.uk/products/elements
2. Figure 2: Person-to-subject area network: presenting research inter-
ests of the faculty
Figure 3: Keyword cloud: presenting domain expertise of a faculty
the subject areas of the journals to the authors of the articles that
are published in the selected journal. The person-to-subject-area
map is helpful for the identification of research interests of a faculty
and the potential collaborators for their future grant proposal writing
and research. These subject areas are fairly generalized. They are
useful to categorize the persons (and their research publications)
and narrowing down the lists. However, in order to realize the core
domain expertise of a faculty, we need to take a step further.
Scholars@Cornell present the domain expertise of a faculty mem-
ber in the form of a keyword cloud (see figure 3). Though, these
keywords are actually extracted from the publication data of the
faculty, they fairly overlap with the skills and expertise of the fac-
ulty member. For the evaluation purposes, we selected a group of
faculty members and compared their publication keywords with the
expertise mentioned on their personal/departmental websites, Re-
searchGate and Academia.edu profile pages. The results were also
affirmed by the researchers in face to face meetings.
We present the co-authorship data at the person and college
level (see figure 4). The inter-departmental and cross-college co-
authorship data is presented using zoomable sunburst; where a user
can start exploring the visualization from a department or college
of interest and zoom in to a faculty’s co-authorship view where the
user can explore the selected faculty’s co-authors and the resultant
Figure 4: Zoomable sunburst visualization: presenting internal co-
authorships.
Figure 5: Chord Diagram: presenting Interpersonal co-authorships
publications. The interpersonal co-authorship view is presented
using Chord Diagram (see figure 5) on faculty’s profile pages.
Scholars@Cornell presents global collaborations using world
and USA map (see figure 6). Idea here is to analyze our external
links, understanding which global organizations we collaborate with,
knowing about Cornell faculty that have links to a global organiza-
tion of interest and more. A user can hover over a country map (or a
state in USA map) to browse the collaboration counts and click on a
map to view the additional details (see figure 7).
At the backend, we record data in a Resource Description Frame-
work (RDF) triplestore. Having data in RDF not only allow us to
create links from local entities to the external resources but also
to use the entity URIs as the local authorities, specifically for the
persons, journals and the local organizations. Our approach involves
working with users to evaluate usability and functional suitability
of visualizations. By engaging our initial pilot partners, we are
currently evaluating these data-driven visualizations by multiple
stakeholders, including faculty, students, librarians, communication
& marketing unit administrators, and the public.
3. Figure 6: Global Collaboration: presenting global collaborations using
World map
Figure 7: Global Collaboration: Detailed view of collaborations info for
a selected state (California) in USA map
Keywords: Scholars@Cornell, VIVO, Data Driven Documents
(D3), Research Information Management System (RIMS), Visu-
alizations, VIVO-ISF Ontology, Scholarship data, Faculty profile
systems
ACKNOWLEDGMENTS
Other members of the Scholars@Cornell team contributed to the
systems development, data modeling, data feeds, and data curation.
We especially acknowledge the efforts of Joe McEnerney, Tim Wor-
rall, Jim Blake, Adam J. Smith, Mary Beth Martini-Lyons, George
S. Kozak and Alan McCarty III.
REFERENCES
[1] Krafft, D.B., Cappadona, N.A., Caruso, Corson-Rikert, J., Devare, M.,
Lowe, B.J., & VIVO Collaboration. VIVO: Enabling national network-
ing of scientists. Paper presented at the Web of Science Conference,
raleigh, NC, 2010.
[2] Scholars@Duke, 2016. https://scholars.duke.edu
[3] Javed, M.. Payette, S., Blake, J., Worrall, T.: VIZ-VIVO: Towards
Visualizations-driven Linked Data Navigation. In: Proceedings of the
International Workshop on Visualizations and User Interfaces for On-
tologies and Linked Data (VOILA 2016). CEUR-WS, vol. 1704, pp.
80–92 (2016).
[4] Penn VIVO - Research expertise at Penn and beyond, 2016.
https://vivo.upenn.edu/vivo
[5] Koperwas, J., Skonieczny, L., Kozlowski, M., Andruszkiewicz, P.,
Rybi´nski, H., Struk, W.: Intelligent information processing for building
university knowledge base. Journal of Intelligent Information Systems,
Springer, pages 1–23, 2016.
[6] Dasiopoulou, S., Lohmann, S., Codina, J., Wanner, L.: Representing
and visualizing text as ontologies: A case from the patent domain.In:
Proceedings of the International Workshop on Visualizations and User
Interfaces for Ontologies and Linked Data (VOILA 2015). CEUR-WS,
vol. 1456, pp. 83–90 (2015)
[7] Dud´aˇs, M., Zamazal, O., Sv´atek, V.: Roadmapping and navigating in the
ontology visualization landscape. In 19th International Conference on
Knowledge Engineering and Knowledge Management, pages 137–152.
Springer, 2014.
[8] Atemezing G.A., Troncy, R.: Towards a linked-data based visualization
wizard. In Proceedings of the 5th International Workshop on Consuming
Linked Data (COLD 2014) co-located with the ISWC, 2014.
[9] Bostock, M., Ogievetsky, V., Heer, J.: D3 data-driven documents. IEEE
Transactions on Visualization and Computer Graphics (TVCG). volume
17, number 12, pages 2301 – 2309, 2011.
[10] Sarli, C.C., Dubinsky, E.K., Holmes, K.L.: Beyond citation analysis: a
model for assessment of research impact. Journal of the Medical Library
Association (JMLA), volume 98, issue number 1, pages 17–23, 2010.
[11] Thellmann, K., Galkin, M., Orlandi, F., Auer, S.: LinkDaViz – Au-
tomatic Binding of Linked Data to Visualizations In proceedings of
International Semantic Web Conference (ISWC), 2015.
[12] Abello, J., van Ham, F., Krishnan, N.:. ASK-GraphView: A Large
Scale Graph Visualization System. IEEE Transactions on Visualization
and Computer Graphics (TVCG), volume 12, number 5, 2006.
[13] Bikakis, N., Liagouris, J., Krommyda, M., Papastefanatos, G., Sellis,
T.: Towards Scalable Visual Exploration of Very Large RDF Graphs. In
proceedings of the European Semantic Web Conference (ESWC), 2015.
[14] Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L. and Su, Z.: Arnetminer:
extraction and mining of academic social networks. In Proceedings
of the 14th ACM SIGKDD International Conference on Knowledge
Discovery and Data mining. pages 990–998, 2008.
[15] Osborne, F., Motta, E. and Mulholland, P.: Exploring scholarly data
with Rexplore. In International Semantic Web Conference (ISWC),
pages 460–477, Springer Berlin Heidelberg, 2013.
[16] Monaghan, F., Bordea, G., Samp, K. and Buitelaar, P.: Exploring
your research: Sprinkling some saffron on semantic web dog food. In
Semantic Web Challenge at the International Semantic Web Conference
(ISWC), vol. 117, pages 420–435, 2010.