Collaboration Recommender

A Collaboration Recommender
Based on Linked Open Data
Conforming to the VIVO Ontology
Anup Sawant, Hugh J. Devlin, Noshir Contractor (Northwestern)
Brandyn J. Kusenda, David Eichmann (Iowa)

VIVO 2012 Miami, Florida USA

This research was supported by grants from the following grants: National Science Foundation grants
CNS-1010904, OCI-0904356, IIS-0838564, UL1RR024146-06S2 and NIH CTSA awards
UL1RR025741, 5UL1RR025741-04S3

SONIC C-IKNOW VIVO Recommender
Outline
• Motivation & Project overview
• MTML collaboration recommendation heuristics
• Report on our practical experience in building
collaboration recommender systems
• Importance of relational data in recommending
collaborations, citation in particular
• Recommendations, Future Work, Questions,
Comments, Suggestions
• Acknowledge Contributors, Collaborators, and Tools

Ascendance of Teams
Studies of 19.9 million research articles over 5 decades as recorded in the Web of Science
database, and an additional 2.1 million patent records from 1975-2005 found three important
facts.

1. For virtually all fields, research is increasingly done in teams

2. Teams typically produce more highly cited research than individuals do (accounting for
self-citations), and this team advantage is increasing over time.

3. Teams now produce the exceptionally high impact research, even where that distinction
was once the domain of solo authors.

Sources: Wuchty, Jones, and Uzzi, 2007a, 2007b

Ascendance of Virtual Teams
The trend toward virtual communities was not driven by a growth in teamwork by
scientists working with other co-located scientists. Using the Web of Science
database to analyze the collaboration arrangements of over 4,000,000 papers over
a 30 year period, Jones, Wuchty, Uzzi found that:

1. Team science is increasingly composed of co-authors located at different
universities.

2. These “virtual communities of scholars” produce higher impact work than
comparable co-located teams or solo scientists.

3. This change is true for all fields and team sizes, as well as for research done at
elite universities

Source: Jones, Wuchty, Uzzi (2008)

Findings for all proposal collaborations
Explaining Proposal Collaboration Relation (p*/ERGM results)
Full model
Effects
(N=2,186)
Control Isolates (single author) 5.447*
Control Edge (proposal collaboration relation) -6.751*
Weighted degree (negative measure
Control 4.623*
of preferential attachment)
H1 Gender (Female) 0.021
H2 Tenure (Years since PhD) 0.002*
H3 Institution Tier (Top 10% universities) -0.098*
H4 H-index -0.014* Researchers are more likely to
have better familiarity of and
H5 Co-authorship 2.431* collaborate again with those they
share a collaboration history (co-
H6 Citation relation 1.132* authorship) or with those they cite
* Indicates p<0.05
Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop
citing me for our own good?”, Academy of Management 5

Project Goals
• Port the SONIC collaboration recommendation heuristics
to VIVO
• Gain practical experience in building systems that use
– Linked Open Data (LOD)
– SPARQL query language
• Cross-institutional recommending
– Generalize the SONIC collaboration recommendation prototype from a
single institution (Northwestern) to multiple institutions
– Explore use of distributed, federated queries
• Technology adoption study of the utilization and impact
of our social-science grounded recommendation
heuristics

WHY DO WE
CREATE,
MAINTAIN,
DISSOLVE, AND
RECONSTITUTE OUR
COMMUNICATION AND
KNOWLEDGE NETWORKS?

Social Drivers:
Why do we create and sustain networks?

• Theories of self- • Theories of contagion
interest • Theories of balance
• Theories of social and • Theories of homophily
resource exchange • Theories of proximity
• Theories of mutual
interest and collective
action
Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
about organizational networks: An analytic framework and empirical example. Academy of
Management Review

Multi-theoretical, Multi-level (MTML)
Collaboration Recommendation Heuristics
Heuristic Social theory Relations Metric
Affiliation proximity affiliation neighbor
coauthorship neighbor
Cocitation mutual interest cocitation neighbor
coauthorship neighbor
Most Qualified self-interest citation h-index
authorship
Friend of a friend balance coauthorship distance
count of geodesics
Social Exchange reciprocity citation dyadic in-degree
Follow the crowd contagion coauthorhip + citation centrality
coauthorship distance
Birds of a feather homophily (attributes) count
Mobilizing collective action coauthorhip + citation shortest path
betweenness
Feeling lucky probablistic model coauthorship p*/ERGM
citation
Monge, P. R. and N. S. Contractor (2003)
Theories of communication networks NY:
Oxford University Press

Affiliation Heuristic
• The ‘Affiliation’ score is proportional to the
number of experts present in same
department as the seeker but haven’t done
any collaboration in the past with the seeker
• A form of proximity theory – social relations
are (at least in part) opportunistic

 “We both work in the same department so we might want to
collaborate in future.”
 Example - Works in the same department (Entomology and
Nematology) but never coauthored.

Affiliation Recommendation

Qualified
Experts

Coauthors Same
Affiliation

Co-Citation Heuristic
• The Co-Citation score is proportional to the
number of times the seeker is co-cited with an
identified expert
• A Cognitive metric: 3rd party rating of similarity
• Mutual interest theory
– Sherif, M. (1958) "Superordinate Goals in the Reduction of Intergroup
Conflict."
 “I have been co-cited with a qualified person quite a few times so
I might want to collaborate with him in future.”
 Example – Co-cited with you 3 times.
(specifically disallows previous co-authors)

Co-citation Recommendation

Qualified
Experts

Coauthors Co-cited

H-index
• A scientist has index ‘h’ if ‘h’ of his/her N papers referencing the query term have
at least h citations each, and the other (N-h) papers have no more than h citations
each.

(image source: Wikipedia)

Hirsch, J. E. (2005) An index to quantify an individual's scientific research output

Qualified H-index
• A scientist has a “qualified h index,” that is, an h-index qualified by a given
concept, based on the number of their publications which are associated
with that concept as a keyword

Most Qualified Heuristic
• The ‘Most Qualified’ score is proportional to
the expert’s “Qualified h-index”
• Self interest theory
– Simon, Herbert (1957). "A Behavioral Model of Rational Choice“
– MacDonald, C. and Ounis, I. (2006) “Voting for candidates: adapting data fusion
techniques for an expert search task ”

 “I like to work with someone who is most useful to me and seems to have a lot of
expertise to offer.”
 Example – 2 of all of this expert’s articles including the query term have been cited
at least 2 number of times.

VIVO Ontology
Representation of Concepts

• Research Areas (associated with researchers)
• Subject Areas (associated with articles)
• Free Text Keywords (associated with articles)

Friend of a Friend Heuristic
• The ‘Friend of a Friend’ score is proportional to the
number of distinct paths through which the expert is
indirectly connected to the seeker, and favors experts
close to the seeker in the collaboration network.
• Balance theory, AKA “closing open triangles”
– Monge, P. R. and N. S. Contractor (2003). Theories of communication
networks.

 “I like to work with someone I have not previously worked with. If I give
our mutual friend as a reference, they’re more likely to accept.”
 Example - Connected indirectly through Hoy,Marjorie Ann via Co-
authorship network.

Friend of a Friend …
• Network: (global) Collaboration
– (scalar) Expert attributes
• Path length: distance d from seeker u to expert e
• Number of geodesics n from seeker u to expert e

nsp (u, e)
fobj (u, e) 2
d (u, e)
(specifically disallows previous co-authors)

Social Exchange Heuristic
• The ‘Social Exchange’ score is proportional to
the number of articles c authored by the
expert e which cite the seeker u
• Reciprocity theory
– Blau, P. M. (2006) Exchange and power in social life.
fobj (u, e) c(u, e)

 “I’ve helped them in the past, so they’re more likely to help me
now.”
 Example – Cited your work in 3 articles.

Follow the Crowd Heurustic
• The ‘Follow the Crowd’ score is proportional to
the expert’s overall popularity in terms of
collaboration and being cited, and favors experts
close to the seeker in the collaboration network.
• Contagion theory
– Krackhardt, D. and Brass, D. J. (1994) Intraorganizational networks: the micro
side.
– Krackhardt, D. M. (1986) Cognitive social structures.

 “They seem to be the most qualified person since many others are
working with them.”
 Example - Co-authored or cited by 5 people and is within 3 step(s)
from you via Co-authorship and Citation Network.

Follow the Crowd …
deg in (e)
fobj (u, e)
d (u, e)
• inDeg: Expert’s in-degree in the combined
network (Collaboration + citation)
• d: distance from seeker u to expert e in the
collaboration network if connected, max(d)
otherwise

Birds of a Feather Heuristic
• The ‘Birds of a Feather’ score is proportional to the (weighted w)
number of attributes a shared between the seeker u and the
expert e, such as moniker (title), department, grad school and
major field of study
• Homophily theory
– Foucault Welles, B., A. Van Devender, et al. (2010) Is a “Friend” a Friend? Investigating the Structure
of Friendship Networks in Virtual Worlds
• No network measures

fobj (u , e) wk ak (u , e)
k
 “I find it easier to communicate with someone who has things in common
with me.”
 Example - Shares one or more of the following attributes : moniker, work
department, grad school and major field of study.

Mobilizing Heuristic
• The ‘Mobilizing’ score favors experts who are brokers and close to
the seeker in the union of the collaboration and citation networks.
• Theory of Collection Action
– Coleman, J. S. (1966) "Individual interests and collective action.“
– Laumann, E. O. and F. U. Pappi (1976) Networks of collective action

 “He seems to be connected to lots of qualified experts and can help me make
more useful connections.”
 Example – Qualified expert who is a broker among other experts.

inDeg(e) bet (e)
fobj (u, e)
outDeg(e) d (u, e)
– fobj(u,e) : Objective function of user u and expert e
– inDeg(e) : in-degree of expert in union of the Collaboration and Citation networks.
– outDeg(e) : out-degree of expert in union of the Collaboration and Citation networks.
– d(u,e) : seeker to expert distance in union of the Collaboration and Citation networks.
– bet(e) expert’s betweenness centrality in union of the Collaboration and Citation networks, see
Wasserman, S. and K. Faust (1995) Social Network Analysis: Methods and Applications

Feeling Lucky Heuristic
• The ‘Feeling Lucky’ is an estimate of the probability of
collaboration using a p*/Exponential Random Graph
Model (ERGM) model of scientific team formation.
• A Probabilistic Model of relationship formation
– Wasserman, S. and G. Robins (2003) An introduction to random graphs,
dependence graphs, and p*
• Factors effecting probability
– In-Degree Centrality of expert in the union of Collaboration and Citation networks
– Publication count of expert
– Similarity (~ “birds of a feather”)
– Moniker
– Work department
– Grad school
– Major Field of Study
– Number of times collaborated with seeker
– Number of times cited seeker

Findings for all proposal collaborations
Explaining Proposal Collaboration Relation (p*/ERGM results)
Full model
Effects
(N=2,186)
Control Isolates (single author) 5.447*
Control Edge (proposal collaboration relation) -6.751*
Weighted degree (negative measure
Control 4.623*
of preferential attachment)
H1 Gender (Female) 0.021
H2 Tenure (Years since PhD) 0.002*
H3 Institution Tier (Top 10% universities) -0.098*
H4 H-index -0.014* Researchers are more likely to
have better familiarity of and
H5 Co-authorship 2.431* collaborate again with those they
share a collaboration history (co-
H6 Citation relation 1.132* authorship) or with those they cite
* Indicates p<0.05
Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop
citing me for our own good?”, Academy of Management 26

Scientometric Relations
Bibliometric Relations
• Authorship relations (author-article)
– Primary evidence of historical collaboration
behavior
• Citation relations (article-article)
– An important leading indicator of future
collaboration behavior

Bibliometric Relations
Descriptions
Directed/
Domain-Range Relation Magnitude
Undirected
author-article authorship directed N
author-author co-authorship undirected Y
article-article citation directed N
author-author citation directed Y
article-article co-citation undirected Y
author-author co-citation undirected Y

Citation-related Relations
Dependencies
Article-Article
Citation

Article-Article Author-Author
Co-Citation Citation

Author-Author
Co-Citation

Garfield, Eugene (1955) "Citation indexes for science"
M. M. Kessler (1963) "Bibliographic coupling between scientific papers"

Citation-related Relations
Four Useful Primitive Operations
• Authorship-related (derived from VIVO)
1. Given an author A, find all articles by A
getArticles(authorURI)
2. Given an article A, find all authors of A
getAuthors(articleURI)
• Citation-related (derived from PubMed)
3. Given an article A, find all articles which cite A
getArticleArticleCitationFrom(articleID)
4. Given an article A, find all articles cited by A
getArticleArticleCitationTo(articleID)

Linking Scientometric Data
VIVO Recommender Sources

Data category VIVO PubMed
Researcher ids Very strong Very weak
Article ids Some PubMed Ids Very strong
Citation data little or none Good
International,
Scope University faculty
1809-present

Author Representation
VIVO vs. PubMed
Prof. Alan R. Katritzky,
Department of Chemistry,
University of Florida

UF VIVO PubMed
http://vivo.ufl.edu/individual/n3622 AR Katritzky
Alan Roy Katritzky
Alan R Katritzky
A R Katritzky

Linking UF VIVO to PubMed
Approach Diagram
• VIVO Author1 Author2 Author3 Author4

Authorship relations

Article1 Article2 Article3 Article4

PubMed ID PubMed ID

• PubMed Article1 Article3
citation

Publication coverage
• 8852 publications in UF VIVO
• 8037 distinct PubMed ids
associated with UF VIVO
publications
• ~90% of UF VIVO’s articles key into
PubMed, making article-article
With PubMed ID citation data available using Linked
Open Data
Without PubMed ID

Faculty coverage
• 6578 Faculty Members in UF VIVO
• 990 (15%) of UF Faculty Members have
at least one publication in UF VIVO
• 906 UF Faculty Members have at least
one publication in PubMed
• Therefore using our approach
(VIVO+PubMed mash-up) just 14% of
UF Faculty Members have the
With at least one PubMed ID
possibility of having article-article
citation data (and hence author-author
citation data) available
no pubs or no pubs with PubMed ID

Cross-Institutional Search
Previous Work (VIVO 2011)
• Direct2Experts
– http://direct2experts.org/
– Distributed query
– Links to a researcher’s home RNS
– Weber GM, Barnett W, Conlon M, Eichmann D, Kibbe W, Falk-Krzesinski
H, Halaas M, Johnson L, Meeks E, Mitchell D, Schleyer T, Stallings
S, Warden M, Kahlon M (2011) Direct2Experts: a pilot national network to
demonstrate interoperability among research-networking platforms
• VIVO Search
– http://beta.vivosearch.org/
– Centralized index of multiple sites

SPARQL Query Language for RDF

Just Say NO! to Web Crawling

SONIC C-IKNOW VIVO
Collaboration Recommender
SONIC C-IKNOW VIVO Web browser
Collaboration (PC, Mac, Smart Phone, tablet)
Recommender client Remote
SONIC
servers
servers Ranked
recommendations VIVO
(Florida)
SONIC C-IKNOW VIVO
p*/ERGM
Collaboration
server VIVO
Recommender server
(Cornell)
SPARQL
R Community (profiles, PubMed
(statnet) User publications,
of interest (Iowa)
profiles citations,
keywords)
Multiple saved
search criteria

Lessons learned
• Researcher Networking Systems (RNSs) should
take article-article citation data seriously
• Adding a robust SPARQL endpoint to each
VIVO-compliant RNS facilitates publishing and
sharing linked open data
• Available free and open source software
(FOSS) tools are mature and more than
adequate to begin building interesting
applications on RNSs

Lessons learned …
VIVO Ontology
• Embrace the existing support in the already
included bibo ontology for article-article
citation data and populate the data
• Add researcher attributes
– Year of last degree
– Gender

Future Work
• Technology adoption study for an online
collaboration recommendation tool for
research scientists
• p*/ERGM probabilistic recommendations
• Improve navigation through the concept space
using an ontology such as MeSH
• Recommend entities

Demonstration
• http://ciknow1.northwestern.edu/vivorecommender/

• Migrating soon to:
http://ciknow.northwestern.edu/vivorecommender/

• GitHub:
http://github.com/soniclab
http://github.com/soniclab/vivo-recommender

Open Source Software Stack
• Java – programming language
• Apache Jena
– RDF interface
– ARQ: SPARQL support
• Java Universal Network/Graph Framework (JUNG) –
social network analysis (SNA) algorithms
– Centrality measures
– Degree of nodes
– etc
• JUNIT – unit testing and quality assurance
• Data-Driven Documents (D3) - visualization

Our Collaborators
• University of Florida
– Mike Conlon
– Nicholas Rejack
– Stephen Williams
• University of Iowa
– David Eichmann
– Brandyn Kusenda
• Cornell University
– Jon Corson-Rikert
– Brian Caruso
– Christopher Manly
– John Fereira

SONIC Contributors
• Anup Sawant • Hugh Devlin

• Joe Gilborne • Willem Pieterson

• Jinling Li • Noshir Contractor

Collaboration Recommender

Recommended

Recommended

More Related Content

Similar to Collaboration Recommender

Similar to Collaboration Recommender (20)

Collaboration Recommender

Editor's Notes