MetaScience
An Holistic Approach for Research Modeling
Valerio Cosentino, Javier Cánovas & Jordi Cabot
ICREA – Open University of Catalonia
@softmodeling modeling-languages.com
http://pixgood.com/mr-hyde-sketch.html
Both need to work
together to have a healthy
community
How can we use
conceptual models
to help ?
Nurtur
e your
Comm
unity
-Data vs Information
-Non-trivial analysis: Integration of different sources
-Conceptual Schema as unifying representation
-Separation of data collection from data analysis
Conceptual Schema
DB schema
DB schema
Flick/skepticalview
Conferences and papers: DBLP
Table
mapping
PCs and topics: web scrapping
+ a variety of other (overlapping) sources & formats
Community Analysis
for everybodyFlickr/leg0fenris
Single metrics, e.g. authors per paper
0
0.5
1
1.5
2
2.5
3
3.5
4
ρ = 0.88
Single metrics, e.g. seniority of researchers
Graph-based analysis:
Co-authorship graphs
Prolific authors
Frequent
collaborations
Clusters
Betweenness (bridge
authors)
Historical / temporal view, e.g. PC Analysis
ARE PC MEMBERS ACTIVE IN THE CONFERENCE?
60 out of the 99 members from 2015 did not publish in the previous 3 editions
ARE ACTIVE MEMBERS BEING IGNORED?
Only 7 researchers published constantly from 2012 to 2014
3 of them were PC members in 2015, while the remaining 4 were not
Newcomers
% of papers with all authors
new to the main track
% of papers with all authors
new to the conf
NL/ Topic analysis, e.g. Top-30 keywords for last 10 edts
From paper abstracts From topics of interest
http://matt.might.net/articles/phd-school-in-pictures
And there’s much more...
Collaborating with the
complex systems group:
rich club ordering, small-
world behaviour, inter-
conf analysis…
CORE analysis (shameless plug)
Tool SupportFlickr/JDHancock
Gitana: Integrated analysis (ER’15)
Coding platform
Issue trackers Commun. channels
Code review tools
https://github.com/SOM-Research/metaScience
ChallengesFlickr/TimPainter
The CS keeps growing
Data Collection limitations
And even more challenges….
 Paper classification
o Not clear distinction of paper types
o Changes on the characteristics from one edition to another (e.g. number of
pages for short papers)
 Committee / topics data
o Conference edition web sites may be not available anymore
 Partial solution: WayBack Machine
o Committee data similar but there is no common “standard”
 Entity resolution
o Researchers can use different names
 Partial solution: DBLP provides aliases
o Researcher names may appear misspelled (mostly in committee data)
Let’s work
together
jordi.cabot@
icrea.cat
@softmodeling
modeling-
languages.com

MetaScience: Holistic Approach for Research Modeling and Analysis

Editor's Notes

  • #3 There is a duality living inside every one of us. Every member of the research community plays two different roles: the role of the researcher and the role of the evaluator of the research work made by others And like it or not both are very important for the health of the community
  • #4 Example for the SC to make the right decisions regarding the future of the conference but also for authors in order to choose a conference to target
  • #5 But today we’re going to talk about another thing . How did we get involved on this? Even if many of my academic colleagues don’t believe so there is life beyond papers so we got interested in understanding how we could have a greater impact
  • #7 If you can do Science, I can do meta science. We can eat our own dog food. In the rest of the paper we’ll show we use conceptual modeling to represent and then analyze communities
  • #8 Size of the community doesn’t really matter. Or at least it’s much more important how the community is internally structured
  • #9 Raw community data is not enough to get any meaningful information Still, looking at raw community data is a mess. A good community analysis is not trivial to do. For any
  • #10 Mention that it’s incremental
  • #15 Gitana https://github.com/SOM-Research/Gitana
  • #16 The Web Crawler relies on Selenium
  • #17 Not all of them implemented!!! Just to give an idea of the challenges Microsoft offers the academic knowledge API
  • #18 I’ll show now some of the analysis that can be done. Obviously these are just examples, once you have the data, you can calculate anything you want to know. Our point is that these analysis are useful for the SC to make the right decisions regarding the future of the conference but also for authors in order to choose a conference to target (e.g. How easy is to enter a conference for new authors, how easy is to become a PC member,...)
  • #19 Example of a single metric and its positive trend
  • #20 We’re not getting any younger
  • #21 Gitana https://github.com/SOM-Research/Gitana
  • #22 Caution: PC members not publishing may still publish in workshops or had other responsiblities Active members being ignored may have their co-authors in the PC so their expertise can be
  • #23 Whether these numbers are good or bad also depend on the comparison with other conferences. Also it depends a lot of what you exactly consider a newcomer
  • #24 Process and business do not show up so strongly in the call for papers While reverse and enterprise only appear in the enterprise This can be helpful to evaluate whether the call for papers respond to the reality of the conference
  • #25 They have more advanced mathematical models that rely on bipartite graphs to calculate more advanced emergent properties of graph
  • #26 Gitana https://github.com/SOM-Research/Gitana
  • #27 Then, we have other research tools that can help to actually improve WordPress itself
  • #28 MetaScience reuses some components of another of our tools Gitana – for analysis of software projects (presented here last year)
  • #29 GEXT: Graph exchange XML format Mecana calculates some metrics on the database data but others are easier to calculate on the graph data There are several exportesr depending on what we wawnt to calculate We can use gephi to directly visusalize the graphs but we are also developing our own visualization component This can be installed and deployed on your own server
  • #30 Gitana https://github.com/SOM-Research/Gitana
  • #31 Partial online service
  • #32 Then, we have other research tools that can help to actually improve WordPress itself
  • #33 Gitana https://github.com/SOM-Research/Gitana
  • #34 We managed to get the whole university blocked from google scholar. As you can imagine we didn’t do many friends. But no, this was not the reason why I escaped from France and get back to Barcelona Without APIs we are limited regarding the information we can use. Sure, it would need to be anonymized but it would really useful to have data on the review process and the rejected papers
  • #36 I hope we can then work together to solve some of these challenges and built the tools we need to better understand ourselves and make sure ER continues being a great conference for many years