Can bibliographic couplings 
inform the structure of large 
public universities? 
Kevin Lanning 
Xingquan Zhu 
Fla Atl U 
Note: Slides preceded by # were not 
included in the presentation due to time 
constraints. 
Updated analyses and a more detailed 
report are available from 
lanning@cal.berkeley.edu
Background: A network of disciplines 
Higher Ed 
Administration 
How can the social system of 
the university be change to 
better serve people? 
Social sciences 
Psychology 
Personality 
How can an empirical 
approach inform the 
structure of a university? 
Data sciences 
SNA 
Bibliometrics 
Can preferential attachment - a 
feature of networks and a pervasive 
source of Inequality - be overcome? 
People are ‘real,’ though transient, 
disciplines are constructions, 
though enduring
The organizational model: 
Advantages and limitations Existing U 
Arts and 
Letters 
Anthro 
English 
Baker Cavell Douglas 
Science 
Biology
An a priori model 
Three broad, interconnected, hierarchically articulated themes 
• Trust 
• Peace studies, diplomacy, cybersecurity 
• Preparedness 
• Disaster, climate change, trauma 
• Vulnerable populations 
• Healthy aging, immigration, early childhood 
The model… 
fails
Towards an empirical approach 
A pilot study 
Limitations: thin data, self-report
The EBRP project 
Bibliographic couplings 
8000 papers, 108000 papers cited therein 
A bipartite graph 
Univ scholar-> cited papers <-Univ scholar 
In most analyses, projected onto one mode 
Univ scholar <-> Univ scholar 
Without the Elsevier Bibliometric 
Research Program (EBRP), this work 
would not have been done
# Some concerns and clarifications 
Not a study of impact or reputation, but engagement (citing 
rather than being cited) 
Not a map of science, but of a community 
Persons are focal units 
The goal is to build social and intellectual capital 
We expect that the approach will have little utility in the 
arts and humanities.
# Some indeterminacies 
No single approach to weighing order of authorship 
A loss of information as one moves from the bipartite to a 
single mode network 
Persons vs papers as targets in the initial network 
Etc.
# Tools 
Database software (Access) 
Extensive cleaning, removal of duplicates, non-faculty authored 
papers, and disambiguating of shared names 
J. Smith -> Smith J1, Smith J2, … 
Gephi 
For network properties and visualizations 
MMNT plugin 
C-finder (Palla, cfinder.org) 
For finding and displaying overlapping communities
# Four approaches to 
representing the network 
of scholars 
Left panels: Bipartite 
networks (referenced 
papers are hidden) 
Right panels: Single 
mode projection 
Top panels: Targets are 
individual references 
Bottom panels: Targets 
are individual authors
The network of departments: 
Global and clique-based 
perspectives 
The communities of departments 
bear little resemblance to the 
existing colleges of the university
# Communities (k-cliques) of departments:
Communities (k-cliques) of persons I: 
The interdisciplinary core
Who are the knowledge conduits? 
Nodes ranked by 
Betweenness Centrality, 
data are University-wide
Who belongs 
to multiple 
communities?
Are the conduits (betweenness) and the 
brokers (community bridgers) the same people? 
Tenured Gender Clique.. Bet… W. Deg. EC PR 
Tenured 1.000 
Gender -0.186 1.000 
Clique bridge -0.005 -0.037 1.000 
Betweenness C 0.059 0.031 0.482 1.000 
Weighted Degree 0.047 0.030 0.493 0.580 1.000 
Eigenvector C 0.046 0.006 0.190 0.524 0.348 1.000 
PageRank 0.052 0.030 0.448 0.658 0.872 0.446 1.000 
N=346. n "clique bridges" = 13
Closing thoughts 
With respect to the title of the talk: 
There is strong evidence that potentially productive 
communities exist outside of academic units 
With respect to this talk in SciTS: 
The concept of (knowledge broker, bridge, gatekeeper) is 
multiply nested, and can be represented at multiple levels of 
analysis. 
With respect to this talk in the world: 
In a time of building walls between people, social network 
analysis can open doors.

Scits 2014

  • 1.
    Can bibliographic couplings inform the structure of large public universities? Kevin Lanning Xingquan Zhu Fla Atl U Note: Slides preceded by # were not included in the presentation due to time constraints. Updated analyses and a more detailed report are available from lanning@cal.berkeley.edu
  • 2.
    Background: A networkof disciplines Higher Ed Administration How can the social system of the university be change to better serve people? Social sciences Psychology Personality How can an empirical approach inform the structure of a university? Data sciences SNA Bibliometrics Can preferential attachment - a feature of networks and a pervasive source of Inequality - be overcome? People are ‘real,’ though transient, disciplines are constructions, though enduring
  • 3.
    The organizational model: Advantages and limitations Existing U Arts and Letters Anthro English Baker Cavell Douglas Science Biology
  • 4.
    An a priorimodel Three broad, interconnected, hierarchically articulated themes • Trust • Peace studies, diplomacy, cybersecurity • Preparedness • Disaster, climate change, trauma • Vulnerable populations • Healthy aging, immigration, early childhood The model… fails
  • 5.
    Towards an empiricalapproach A pilot study Limitations: thin data, self-report
  • 6.
    The EBRP project Bibliographic couplings 8000 papers, 108000 papers cited therein A bipartite graph Univ scholar-> cited papers <-Univ scholar In most analyses, projected onto one mode Univ scholar <-> Univ scholar Without the Elsevier Bibliometric Research Program (EBRP), this work would not have been done
  • 7.
    # Some concernsand clarifications Not a study of impact or reputation, but engagement (citing rather than being cited) Not a map of science, but of a community Persons are focal units The goal is to build social and intellectual capital We expect that the approach will have little utility in the arts and humanities.
  • 8.
    # Some indeterminacies No single approach to weighing order of authorship A loss of information as one moves from the bipartite to a single mode network Persons vs papers as targets in the initial network Etc.
  • 9.
    # Tools Databasesoftware (Access) Extensive cleaning, removal of duplicates, non-faculty authored papers, and disambiguating of shared names J. Smith -> Smith J1, Smith J2, … Gephi For network properties and visualizations MMNT plugin C-finder (Palla, cfinder.org) For finding and displaying overlapping communities
  • 10.
    # Four approachesto representing the network of scholars Left panels: Bipartite networks (referenced papers are hidden) Right panels: Single mode projection Top panels: Targets are individual references Bottom panels: Targets are individual authors
  • 11.
    The network ofdepartments: Global and clique-based perspectives The communities of departments bear little resemblance to the existing colleges of the university
  • 12.
  • 13.
    Communities (k-cliques) ofpersons I: The interdisciplinary core
  • 14.
    Who are theknowledge conduits? Nodes ranked by Betweenness Centrality, data are University-wide
  • 15.
    Who belongs tomultiple communities?
  • 16.
    Are the conduits(betweenness) and the brokers (community bridgers) the same people? Tenured Gender Clique.. Bet… W. Deg. EC PR Tenured 1.000 Gender -0.186 1.000 Clique bridge -0.005 -0.037 1.000 Betweenness C 0.059 0.031 0.482 1.000 Weighted Degree 0.047 0.030 0.493 0.580 1.000 Eigenvector C 0.046 0.006 0.190 0.524 0.348 1.000 PageRank 0.052 0.030 0.448 0.658 0.872 0.446 1.000 N=346. n "clique bridges" = 13
  • 17.
    Closing thoughts Withrespect to the title of the talk: There is strong evidence that potentially productive communities exist outside of academic units With respect to this talk in SciTS: The concept of (knowledge broker, bridge, gatekeeper) is multiply nested, and can be represented at multiple levels of analysis. With respect to this talk in the world: In a time of building walls between people, social network analysis can open doors.

Editor's Notes

  • #3 Slide 2. Overview The present paper is influenced by, or lies at the intersection of, three areas - higher education administration, social network analysis, and personality psychology. These areas can together be represented as a network whose nodes and interconnections give rise to a number of questions, two of which are of focal interest. The first roughly corresponds to the title of the talk on which this paper is based, namely, whether an empirical, large data approach can inform the academic structure of universities, particularly mid-size, comprehensive public institutions such as Florida Atlantic University. A second question is how to identify key faculty who can potentially effect productive, even transformative, change in an institution, or at the very least, help to maintain the institution’s standing during a time of economic and cultural challenge. For me, the two questions are closely related; their relationship is derived from the premise that a successful model of a university community begins with people (who are real, though transient) rather than disciplines (which are artificial, though durable).
  • #4 It is not too much of an oversimplification to say that the academic structure of the typical American public university is a multiply-nested model in which each faculty member is assigned to a single department, each department is assigned to a single college, and in which each department corresponds to a more-or-less traditional academic discipline. The model closely resembles an organizational chart, and from an organizational standpoint, it has some real benefits. The assignment of faculty to single departments increases the likelihood that all will be judged comparably, with none suffering particularly from the double jeopardy of too-harsh assessments in multiple units or, conversely, slipping through the organizational structure unassessed, without feedback, and potentially unproductive. Problems arise not because the model fails as an administrative structure, but because it is overextended. We use this same model not only as an organizational chart, but also as a map for prospective students, and, in addition, as a model of the structure of expertise. It does not serve these other functions particularly well, and is likely to impede collaboration between faculty members in different units. Partly in response to these limitations, I was charged with articulating a new structure of the community of scholars engaged in the study of so-called “societal issues,” a grab bag of scholarly topics which engaged faculty in the social sciences, education, nursing, and environmental studies areas across the nine colleges and many campuses of Florida Atlantic University. It was and is a very interesting problem, and my work on this has extended beyond the length of my administrative appointment.
  • #5 I began by elaborating upon the university’s strategic plan, articulating an a priori model based on topics such as trust, persons in environments, and healthy aging. When I went to departments across the university to discuss this developing model, it quickly became apparent that this approach could not be successful, in part because it could provide only a somewhat arbitrary skeleton, but also because many faculty resisted – understandably – the imposition of a new structure from above. This was an empirical reason for trying an empirical approach
  • #6 Consequently, I decided to explore an empirical approach, grounded in actual faculty achievements. I began with a pilot analysis which was based on network analyses of a small sample of respondents to a survey. In the present paper, I describe a much more ambitious approach in which a faculty network of the entire university is based upon shared references, or bibliographic couplings. The data were obtained via support from the Elsevier Bibliometric Research Program EBRP-2013, and includes some 8000 papers authored or coauthored by FAU investigators and 108000 potential links or bonds in the form of the papers cited therein.
  • #7 Consequently, I decided to explore an empirical approach, grounded in actual faculty achievements. I began with a pilot analysis which was based on network analyses of a small sample of respondents to a survey. In the present paper, I describe a much more ambitious approach in which a faculty network of the entire university is based upon shared references, or bibliographic couplings. The data were obtained via support from the Elsevier Bibliometric Research Program EBRP-2013, and includes some 8000 papers authored or coauthored by FAU investigators and 108000 potential links or bonds in the form of the papers cited therein.
  • #8 Consequently, I decided to explore an empirical approach, grounded in actual faculty achievements. I began with a pilot analysis which was based on network analyses of a small sample of respondents to a survey. In the present paper, I describe a much more ambitious approach in which a faculty network of the entire university is based upon shared references, or bibliographic couplings. The data were obtained via support from the Elsevier Bibliometric Research Program EBRP-2013, and includes some 8000 papers authored or coauthored by FAU investigators and 108000 potential links or bonds in the form of the papers cited therein.
  • #9 It should go without saying that there is no single, unique solution There is inevitably a loss of information as one moves from a bipartite to a single-mode network. One approach is to consider the single-mode network as one populated by directed, asymmetrical reciprocal linkages; here, I’m using a simpler approach in which the single mode network is treated as undirected, and relationships are treated as symmetrical. Another indeterminacy occurs in that one could choose references or individuals as targets. A third is that the weighting of faculty authorship can be handled in many ways (ref); here, I’ve simply treated all contributions equally. faculty->reference<-faculty network to the single-mode faculty-faculty network; that is, the same weight might reflect A loss of information, indeterminacies, optimal solutions… And by focusing on individuals, the number of data points that define each node and edge can become small; losing the advantage of robustness against error
  • #10 Database software (Access). Didn’t help me navigate the conference schedule. Extensive cleaning, removal of duplicates, non-faculty authored papers, and disambiguating of shared names ( R. Sherman -> Sherman R1 or Sherman R2) The EBRP data also includes second-order references. Gephi: MMNT (multimode network transformation) plugin to reduce bipartite to single mode networks C-finder (Palla, Cfinder.org). In analyses of community structure, I used the c-finder software which provides analyses and visualizations of communities (k-cliques, defined as groups in which each node is connected to at least k other nodes). One important advantage of the approach is that it allows recognition of overlapping structures, another is that it instantiates the notion of “family resemblance” as the defining characteristic of category membership. (That is, categories are not defined by a set of necessary and sufficient properties, but by features shared with a subset of other group members).
  • #11 In these diagrams, individual faculty are represented by college (color). Node size corresponds, I believe, to weighted degree. Left panels: Bipartite networks (referenced papers are hidden) – no edges between scholars. Right panels: Single mode projection. Top panels: Targets are individual references (e.g., two investigators both cite Xavier & Smith, 2008). In the bottom panels, targets are individual authors (two investigators both cite a paper by Xavier). This second approach is denser but noisier, and on closer examination it included too many problems to be useful. And while simple proximity and community membership in the true bipartite networks are informative, they are, I think, trumped by the simplicity of the projected graphs. That said, across the representations some similarities may be noted: For example, Business (yellow), and Nursing (blue) are well represented and coherent; Engineering (pale blue) and Science (peach/gold) are well-represented but appear less coherent.
  • #12 Here, I consider communities among the 46 departments of the university, in which links arise because faculty members in two departments cite the same sources. In the illustration at left, colors correspond to colleges; this graph was generated using Gephi. In this graph, departments in the College of Arts and Letters, like those of the tiny Honors College, are on the periphery, literally marginalized by the approach. The seven departments in the College of Education are in close proximity, as are the six departments in the College of Business. This is not true, however, for the five departments of the College of Design and Social Inquiry, in green. (The Colleges of Medicine and Nursing each consist of a single department). At right are illustrated three overlapping cliques which lie at the core of the graph, generated from c-finder. In this first image, I’ve filtered out all but the strongest links (those joined by a minimum link weight of 15) in order to reveal only the most robust links between departments. With this minimum link weight of 15, three overlapping k-cliques appear, together comprising 11 departments, each of which is linked to three or more other departments. The first of these communities includes Geosciences, Biology, Urban and Regional Planning, Computer and Electrical Engineering and Computer Science (here, EECS), and Medicine. This is the most tightly connected large group of departments in the University, but these five areas span four colleges. I examine these departments more closely below. These strong cliques of departments are strikingly heterogeneous with respect to the colleges from which they are drawn; in the two cliques with five departments, each is drawn from four different colleges. The remaining clique includes three departments in the College of Business together with psychology. The communities of departments bear little resemblance to the existing colleges of the university.
  • #13 From here, we’ll go in two directions, first to expand upon the network of departments, after which I’ll turn to the persons in these departments, I then iteratively relax this criterion to allow more clusters to appear. Recall that the initial group of three communities was based on a minimum link weight of 15. When we lower this to 12, four additional departments appear in communities (k-cliques) of four or more. The left-most cluster includes four of the six departments in the College of Business. A fifth business department (Information Technology and Operations Management) appears in the community directly adjacent to this. The structural relationship of these five departments illuminates the centrality of Management and Marketing within the FAU College of Business; the relative separateness of the sixth department in the College of Business (Economics), is an interesting reflection, too, on the structure of the area. Beyond the College of Business, this community (in green) also serves as a bridge between all three of the other communities. One of these (below right) is the community of five departments considered previously; the other defies easy classification, fusing the helping professions of nursing, social work, medicine, and psychology, with the formalism of EECS and math. Taken together, these four communities include only a fraction – 15/46, or less than one third, of the departments of the university, that is, those which are connected to at least three other departments by a substantial number of common references within the sample of papers under study. When including those departments with only two links to other departments, the space includes an additional ten departments, as can be seen at right. If the “minimum-weight” criterion is relaxed further, indeed eliminated, then fully 43 of the 46 departments can be represented.
  • #14 In order to examine the structure of communities of faculty, I begin with those who are members of the five closely connected departments of EECS, Geosciences, Medicine, Biology, and Urban Planning. Part of the reason I chose to look at these was that I was skeptical of the method given the seeming heterogeneity of the result. Here, I used a minimum link strength of 5.The largest clique is shown in the small figure. It includes faculty from four of the five departments (all but EECS), and three separate colleges (Medicine, Science, and Design). Clockwise from the top, the geoscientist has doctorates in Geospatial Information Sciences and Ocean Remote Sensing. The Urban Planner studies housing, community development, and real estate economics. The three faculty in Medicine include a bench scientist who studies gene/promoter elements in myocardial tissue who has a secondary appointment in biology, one who examines the molecular genetics of eye diseases, and a third whose interests include Gene expression profiling and gene regulation in the heart; finally, the biologist studies molecular mechanisms of plant growth, and has coauthored with Faculty in medicine. This first group of 6 individuals is fully connected. Though we are still looking at individuals who are connected by a link strength of at least 5, now we will look at individuals who are connected to as few as two others. The network that appears is a community of communities, with engineering at the “head,” medicine in the trunk, and biology, geosciences, and urban planning primarily behind. Anthropomorphizing aside, the figure illustrates how the idea of “knowledge broker” can be multiply nested, that in addition to persons who serve as bridges between communities, there are communities which bridge communities, and even communities which bridge communities of communities. More concretely, it appears that, for this group of faculty and departments, the departments appear quite coherent as groupings. Alternatively, at my university, the assignment of departments to colleges approaches randomness; while the assignment of faculty to departments does not.
  • #15 The role of individuals as bridges between groups has been highlighted by a number of speakers already at the conference. In network analysis, there is one parameter which seems close to the idea of “knowledge broker:” It is betweenness centrality, which is a measure of how frequently one appears on the shortest path between other nodes in the network. When we look across the whole university, we can connect 346 faculty. Four of the top five on betweenness centrality are women, drawn from departments of urban planning, psych, nursing, and geosciences. The fifth is a male in EECS.
  • #16 I want to return to the community approach I described before, using now the whole network of FAU faculty. In order to keep this simple, I’ve used a “link strength” of at least (?) 5. I found 13 faculty who spanned multiple communities, where communities are defined at k=3 through 6. Here, I illustrate a few examples. The first is a classic example of an interdisciplinary bridge, linking faculty in the med school to those in engineering. The second spans two more seemingly diverse groups, one comprised of faculty in management and planning, the other primarily of faculty in Exercise and Sport Science. The third also involves Exercise Science. Here, a faculty member in that area bridges two groups which include, on the one hand, biology, and on the other, marketing, management and social work.  
  • #17 The answer is ‘often’ – it is the most productive people who are typically the most connected – but not invariably. Of the 13 individuals identified as bridgers in this analysis, five are in the top ten in betweenness centrality (out of 346 faculty). To start to look at this more closely, I computed a quick correlation matrix of the different measures of scholarly impact. These values are attenuated by differences in the shapes of the variables under study, particularly for clique-bridger. The gender effect that was suggested by the first few individuals at the top of the distribution disappears here.