1/18	
Semantometrics:	Fulltext-based	measures	
for	analysing	research	collabora<on	
Drahomira	Herrmannova	(@damirah)	
KMi,	The	Open	University	
&	
Petr	Knoth	(@petrknoth)	
Mendeley	Ltd.
2/18	
Introduc<on	
•  Up	to	date	many	studies	of	scien<fic	cita<on,	
collabora<on	and	coauthorship	networks	have	
focused	on	the	concept	of	cross-community	
-es.	
•  We	explore	how	Semantometrics	can	help	in	
understanding	the	nature	of	the	cross-
community	<es	and	in	characterising	the	types	
of	research	collabora<on	in	scholarly	
publica<on	networks.
3/18	
Semantometrics	
•  A	set	of	metrics	for	evalua<ng	research	which	
build	on	the	premise	that	fulltext	is	needed	to	
understand	the	value	of	publica<ons
4/18	
Cross-community	<es	
•  Links	between	communi<es
5/18	
Cross-community	<es	
•  The	importance	of	cross-community	<es	
– In	cita<on	networks,	cross-community	cita<on	
paYerns	are	characteris<c	for	high	impact	papers	
[Shi.	et	al.,	2010]	
– Same	holds	true	in	case	of	cross-community	
scien<fic	collabora<on	[Newman,	2004;	LambioYe	
and	Panzarasa,	2009]
6/18	
How	to	iden<fy	cross-community	<es?	
•  From	cita<on/coauthorship	network	
– E.g.	betweeness	centrality	
•  From	fulltext	
– Seman<c	similarity	
•  Different	types	of	collabora<on	when	wri<ng	
a	paper	
– Emerging	vs.	established	
– Interdisciplinary	vs.	intradisciplinary	
– Etc.
7/18	
Emerging	vs.	established	research	
collabora<on	
•  Endogamy	
– In	social	sciences:	the	prac<ce	or	tendency	of	
marrying	within	a	social	group	
– In	research:	collabora<on	within	a	group	of	authors	
– Higher	endogamy	=	more	frequent	collabora<on	
endo A d A
a A d a
endo p x L p endo x
L p
p	 Publica<on	
A	 Set	of	authors	
d(A)	 Papers	coauthored	by	authors	in	A	
L(p)	 Set	of	all	subsets	with	at	least	two	
authors	of	p
8/18	
Interdisciplinary	vs.	intradisciplinary	
research	collabora<on	
•  Seman<c	distance	of	publica<on	authors	
– Higher	author	distance	indicates	more	distant	
communi<es	
– Author	publica<on	record	considered	as	a	single	
text	
a dist p
A p A p
dist a , a
a A p ,a A p ,a a
p	 Publica<on		
A(p)	 Set	of	authors	of	p
9/18	
Types	of	collabora<on	when	wri<ng	a	
paper	
High	endogamy	 Low	endogamy		
High	author	
distance	
Established	
interdisciplinary	
collabora<on	
Emerging	
interdisciplinary	
collabora<on	
Low	author	
distance	
Expert	group	
Emerging	expert	
collabora<on
10/18	
Experiment	
•  What	is	the	distribu<on	of	the	four	different	
types	of	collabora<on	in	scholarly	literature?	
•  CORE	(core.ac.uk)	used	as	a	dataset	
– Cross-discipline		
– Enables	sampling	by	authors	and	ins<tu<ons	
– Selected	sample	
•  Fulltext	documents	from	Open	Research	Online	
repository	(ORO)	
•  All	other	fulltext	publica<ons	of	the	authors	from	ORO	
found	in	CORE
11/18	
Dataset	sta<s<cs	
Fulltext	ar<cles	from	ORO	 4,207	
Number	of	authors	 8,473	
Average	number	of	publica<ons	per	author	 7.61	
Max	number	of	publica<ons	per	author	 310	
Average	number	of	authors	per	publica<on	 4.31	
Max	number	of	authors	per	publica<on	 25	
Average	number	of	received	cita<ons	 0.30	
Average	number	of	collaborators	 80.23	
Total	number	of	publica<ons	 30,484
12/18	
Endogamy	and	author	distance	distribu<on
13/18	
Endogamy	and	author	distance	vs.	number	
of	authors
14/18	
Rela<on	between	author	distance	and	
endogamy		
Established	
interdisciplinary	
collabora<on	
	
Expert	group	
	
Emerging	
interdisciplinary	
collabora<on	
	
Emerging	expert	
collabora<on
15/18	
Types	of	research	collabora<on
16/18	
Types	of	research	collabora<on	and	
“impact”
17/18	
Conclusions	
•  Fulltext	necessary	
•  Semantometrics	are	a	new	class	of	methods		
•  We	showed	one	method	to	recognise	types	of	
scholarly	collabora<on
18/18	
References	
•  Xiaolin	Shi,	Jure	Leskovec,	and	Daniel	A	
Mcfarland	(2010)	Ci0ng	for	High	Impact	
•  M.	E.	J.	Newman	(2004)	Coauthorship	
networks	and	pa:erns	of	scien0fic	
collabora0on	
•  R.	LambioYe	and	P.	Panzarasa	(2009)	
Communi0es,	knowledge	crea0on,	and	
informa0on	diffusion

Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysing Patterns of Research Collaboration