Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Introduction Methodology Experiments Conclusions
Multimodal graph-based analysis over the DBLP
repository: critical discoveries and hypotheses
Gabriel Perri Gimenes, Hugo Gualdron, Jose F Rodrigues Jr 1
Mario Gazziro 2
1University of Sao Paulo 2Fed. University of Santo Andre
Av Trab Sao-carlense, 400 Av dos Estados, 500
Sao Carlos, SP, Brazil - 13566-590 Santo Andre, SP, Brazil - 09210-580
{ggimenes,gualdron,junio}@icmc.usp.br mario.gazziro@ufabc.edu.br
This work has ﬁnancial support from Fapesp (2013/10026-7)
http://www.icmc.usp.br/pessoas/junio/Site/index.htm
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 1/21

Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions

Introduction
High demand for informations about the behavior of
scientists: authors, editors, funding agencies and society
Combining analytical techniques - multimodal approach

Problem
Finding non-evident facts about DBLP is a non-trivial task
Single-technique approaches - limited analytical potential
Sistematic process - can be applied on similar data from other
domains

Hypothesis
Hypothesis
The use of multiple analytical techniques, through a well-deﬁned
process, is capable of revealing important aspects of the scientiﬁc
community in computer science

Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions

Materials
Cardinality of the entities extracted from DBLP - XML
Entity Number
Authors 1.060.221
Articles 1.801.576
Events 14.654
Publications 4.262

Data migration
Semi-structured format ⇒ Relational model
Need of speciﬁc software for the migration
Deﬁnition of the entity-relationship model:

Extracted relationships
Relationship Description
Co-authorship Authors that published an article
togheter.
Co-edition Authors that appear as editors in the
same event or journal.

Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions

Multimodal Analysis - WCC
Weakly-connected components distribution - Co-authorship
13% small components with up to 30 nodes
Giant component with 87% of the authors
44.000 sub-networks of co-authorship - eventual researchers,
industry white papers

Multimodal Analysis - ACC
Node degree × average clustering coefficient - Co-authorship
High coefficient values are found in nodes with degree < 10
Coefficient value decreases as the node degree increases - ACC ∝ degree−1.06
Authors tend to colaborate with the co-authors of their co-authors - triangles
Young authors vs. older authors

Multimodal Analysis - Densiﬁcation
Degree distribution - Co-autorship
As new authors appear new edges also appear - e(t) ∝ n(t)1.47 - densiﬁcation
Edges appear exponentially vs. publication of elaborated articles
Master and Ph.D as regular courses
Funding agencies - numbers
More authors per paper

Multimodal Analysis - Diameter
Eﬀective diameter evolution - Co-edition
Peaked near 1995 - beginning of a shrink period
Before that - new editors/publication vehicles vs. after that - same editor/same
vehicles
Densiﬁcation period: more new edges than new nodes - editor commitees rotate
between same members
Editor: experience and expertise - limitations for new researchers

Multimodal Analysis - Previsibility
Previsibility analysis - Co-authoring
Can we predict new interactions in the DBLP newtork?
Extraction of topological features → supervised learning
Figure: Results - Interval G[1995, 2005], G[2006, 2007]

Multimodal Analysis - Counting and algebraic analysis
Counting - Bipartite author-article network with timestamps
Accomplishment: number of years with at least one
publication
Silence: number of consecutive years with no publications

Multimodal Analysis - Counting and algebraic analysis
Proposed metric
Importance = 1√
silence+1
∗ log(Accomplishment)

Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions

Conclusions
Well-deﬁned analytical process - combination of multiple
techniques
Non-trivial extraction of information from DBLP
Multi-perspective interpretations about the past and future of
the academic community in computer science
Application in the decision making process of funding agencies
and academic personnel

Thanks!
Questions?

Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses

Similar to Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses (20)

More from Universidade de São Paulo

More from Universidade de São Paulo (13)

Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses