NLP Rankings: Publication-based Ranking System and Platform for NLP Research

NLP RANKINGS: PUBLICATION-BASED RANKING
SYSTEM AND PLATFORM FOR NLP RESEARCH
BY: CHLOE LEE

AGENDA
¡ Introduction
¡ Related Works
¡ NLP Rankings
¡ Demonstration
¡ Analysis
¡ Conclusion

INTRODUCTION
¡ Growing demand to analyze unstructured data brings tremendous attention to the field of Natural Language
Processing (NLP)
¡ Relatively new field compared to other well-established disciplines and programs
¡ Limited information to assess the quality of NLP research environment at different universities
Introduction Related Works NLP Rankings Demonstration Analysis Conclusion

INTRODUCTION - PURPOSE
¡ Provide insights regarding NLP programs in the United States to the research community by creating a
customizable ranking dedicated to NLP
¡ particularly, for current faculties and prospective students interested in NLP

RELATED WORKS
Generic University
Rankings
U.S. News Rankings
QS World University Rankings
Publication-Based
University Rankings
NTU Rankings
CSRankings

RELATED WORKS – GENERIC RANKINGS
¡ U.S. News Rankings
¡ Ranks universities in the United States based on
1. expert opinions about the program excellence
2. statistical indicators that measure the quality of a school’s
faculty, research, and students
¡ data used to calculate the rankings comes from
statistical surveys answered by academic professionals
¡ Opinion-based
¡ QS World University Rankings
¡ Ranks universities in the world based on
1. Academic Reputation (40%)
2. Employer Reputation (10%)
3. Faculty/Student Ratio (20%)
4. Citations per faculty (20%)
5. International Faculty Ratio (5%)
6. International Student Ratio (5%)
¡ Opinion-based

RELATED WORKS – PUBLICATION-BASED RANKINGS
¡ NTU Rankings
¡ Ranks universities in the world based on
1. Research productivity (25%)
2. Research Impact (35%)
3. Research Excellence (40%)
¡ rankings reflect university’s research output in terms of
publication quantity and quality
¡ Research quality is measured by citation, which
may be susceptible to citation cartel

RELATED WORKS – PUBLICATION-BASED RANKINGS
¡ CSRankings
¡ Compiled by Emery Berger
¡ Metric-based ranking system that ranks Computer
Science programs
¡ Ranking universities by their presence at prestigious
publication venues
¡ Ranking scores change as faculty move
¡ Publication venues carry equal values
¡ Limited venues selected for NLP programs

NLP RANKINGS – DATA COLLECTION
¡ Publications published from 2010 to 2019 are collected from ACL Anthology
¡ Publication conference and venues selected
1. Annual Meeting of the Association for Computational Linguistics (ACL)
2. Computational Linguistics (CL)
3. International Conference on Computational Linguistics (COLING)
4. Conference on Computational Natural Language Learning (CoNLL)
5. European Chapter of ACL (EACL)
6. Conference on Empirical Methods in NLP (EMNLP)
7. International Joint Conference on NLP (IJCNLP)
8. North American Chapter of ACL (NAACL)
9. Transactions of the Association for Computational Linguistics (TACL)
10. workshop and demonstration paper (WS)
¡ Total number of publication: 24,896
¡ By academic authors in the US: 6,261
¡ Total number of unique authors: 24,838
¡ Unique authors in the US: 7,426

NLP RANKINGS – PUBLICATION OVERTIME
Number of NLP publications over the last 10 years Number of NLP authors over the last 10 years

NLP RANKINGS – AUTHOR-UNIVERSITY MATCHING
¡ Email addresses are extracted by using a comprehensive group of regular expressions
¡ Email addresses of publication authors are important for institutional authorship

NLP RANKINGS – AUTHOR-UNIVERSITY MATCHING
¡ The order email addresses are presented might not match
the authors
¡ A list of email addresses are pseudo-generated using the
authors’ name under the following conventions
¡ firstname lastname
¡ f (m) lastname
¡ lastname f (m)
¡ firstname
¡ lastname
¡ f (m) l
20 15 20 20 16 20 14 4 13 10 1 12 15 18 18 15 16 13
15 18 16 16 16 16 15 18 20 18 16 18 7 8 15 5 1 7
7 6 12 8 2 10 17 13 20 18 14 17 18 16 15 16 15 18
Email 1
Email 2
Email 3
Pseudo-generated emails
¡ Match authors and email addresses by Levenshtein
distance
¡ Start with the minimum of the matrix

NLP RANKINGS – SCORING MECHANISM
¡ Different publication conferences and venues carry different weights
¡ major venues (ACL, CL, EMNLP,TACL, NAACL): 3
¡ other conferences: 2
¡ workshops/demonstrations:1
¡ Credit for each publication is evenly distributed to all authors
¡ each author receives a score of
𝒘
𝒂
for each publication
¡ Institutional scores = sum of authors score who dedicate their work to the institution
¡ Students’ contribution also count (not just faculties’)
¡ If an author moves, one’s previous score will not be transferred to the new institution

DEMONSTRATION
Rankings Visualizations

ANALYSIS
¡ University-Level Analysis
¡ Author-Level Analysis
¡ User Analysis

ANALYSIS – UNIVERSITY-LEVEL ANALYSIS
¡ Carnegie Mellon University remained 1st for the
past ten years
¡ Ranking score gaps are more significant between top
universities
¡ Top universities remained largely competitive over time
¡ Most top 50 universities showed an upward movement
in ranking year over year
¡ average rank change between 2010 and 2019 is 15.52

ANALYSIS – UNIVERSITY-LEVEL ANALYSIS
Rank Institution 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
1 Carnegie Mellon
University
1 1 1 1 1 1 1 1 1 1
2 University of
Washington
5 6 10 4 7 3 3 3 2 2
3 Stanford University 7 8 5 3 2 2 2 4 3 4
4 Johns Hopkins University 10 5 6 6 6 4 5 2 4 3
5 Columbia University 4 2 2 2 3 5 6 15 23 12
6 Massachusetts Institute
of Technology
9 11 13 10 10 6 6 6 7 5
7 University of Illinois at
Urbana-Champaign
2 10 9 7 12 10 4 7 12 8
8 University of California,
Berkeley
3 9 8 8 9 9 8 5 19 21
9 University of
Pennsylvania
11 13 3 19 13 13 13 10 5 6
10 University of Maryland 8 7 4 12 5 8 10 9 14 13

ANALYSIS – UNIVERSITY TREND CLUSTERING
¡ hierarchical cluster analysis to cluster universities
by their similarity in trends, usingWard variance
minimization algorithm
¡ Grouped into 3 major clusters
¡ Red: 26 high-tier universities (top 30)
¡ Blue: Carnegie Mellon University
¡ Green: 189 mid-lower tier universities

ANALYSIS – SUB-CLUSTER EXAMPLE
¡ Very similar ranking score trend
¡ Could possibly suggest similar research interest
¡ Information Sciences Institute
¡ Kevin Knight: NLP, machine translation, automata theory and decipherment
¡ University of California, Berkeley
¡ Dan Klein: Unsupervised language acquisition, Machine translation, Information extraction
¡ University of Illinois at Urbana-Champaign
¡ Dan Roth: Artificial Intelligence, natural language understanding
¡ University of Maryland
¡ Philip Resnik: Machine translation, Computational social science, Computational
psycholinguistics and neurolinguistics

ANALYSIS – AUTHOR-LEVEL ANALYSIS
¡ Author ranking calculated by summing all the publication per authors
¡ Universities that these top NLP authors have worked for or are working at
1. Carnegie Mellon University
2. Stanford University
3. Columbia University
¡ Universities with only one or two top 100 NLP authors
¡ Language, Information, and Learning lab at Yale (LILY)
¡ Dragomir Radev (Top 100 NLP Author, NLP Faculty atYale)
¡ Brown Laboratory for Linguistic Information Processing (BLLIP)
¡ Eugene Charniak (Top 100 NLP Author, NLP Faculty at Brown)

ANALYSIS – AUTHOR-LEVEL ANALYSIS
Universities Attended by Top 100 NLP Authors

ANALYSIS – WEIGHT-CONTRIBUTION INDEX
¡ rankings as a sum of scores may be deceiving because
it is on a university-level
¡ individual performance on an author-level
¡ Jorge E. Hirsch (2005): h-index
¡ number of papers with 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 ≥ ℎ as an
index
¡ encourages large amount of high-quality publications
¡ citations can be misleading
¡ weight-contribution index for NLP Rankings
¡ Different publication conferences venues carry different
weights (some are more major than others)
¡ Score for each author in each publication =
#
$
¡ Index is calculated by identifying number of papers with
score > 1
¡ Identify how active and independent researchers are
¡ shows the behavior and current status of researchers

ANALYSIS – WEIGHT-CONTRIBUTION INDEX
wc-index
Rank Name h-index* 2015 2016 2017 2018 2019
1 Dan Roth 50 5 13 17 25 31
2 Noah A. Smith 53 7 8 11 17 23
3 Dan Klein 47 6 9 18 23 26
4 Christopher D. Manning 90 6 12 15 18 20
5 Eduard Hovy 54 2 5 6 7 12
6 Mohit Bansal 30 1 4 7 20 30
7 Vincent Ng 30 5 8 10 11 11
8 Luke Zettlemoyer 45 4 7 7 10 16
9 Claire Cardie 43 2 3 7 13 16
10 Garham Neubig 32 0 2 5 13 18
11 Chris Dyer 53 5 5 5 5 5
12 Heng Ji 38 0 2 3 3 5
13 Kevin Knight 42 3 7 9 11 11
14 William Yang Wang 24 3 3 6 13 19
15 Jason Eisner 32 4 9 12 16 19
16 Regina Barzilay 5 8 10 12 14
17 Mona Diab 33 0 1 1 4 4
18 Dan Jurafsky 63 3 5 7 7 7
19 Nizar Habash 33 2 3 4 7 9
20 Jordan Boyd-Graber 33 2 5 8 10 14
21 Kathleen McKeown 33 4 6 8 9 14
22 Mark Dredze 49 8 11 11 13 14
23 Percy Liang 45 4 10 12 15 18
24 Rada Mihalcea 2 4 7 8 10
25 Yejin Choi 38 0 2 4 5 5
26 Kevin Gimpel 27 1 1 4 6 9
27 Jacob Eisenstein 30 6 10 12 15 15
28 Tom Mitchell 54 4 6 9 11 13
29 Yang Liu 26 4 5 5 5 5
30 Bing Liu 69 2 3 5 5 5
joined Google, his recent publications (after
2017) are under Google’s authorship

ANALYSIS – USER ANALYSIS
¡ February 12, 2020 - March 28, 2020 (46 days)
¡ total of 3,913 accesses
¡ 1,219 distinct IP addresses
¡ Time period viewed
¡ 97.3% viewed the default 2010-2019
¡ 2015 and 2016 are the start years that are checked the most, followed by 2018 and 2017
¡ Interested in most recent years
¡ Suggests beginning of the emerging interest in NLP

ANALYSIS – USER ANALYSIS
¡ Weight Customization
¡ 99.2% use the default weights
¡ agree with the proposed values
¡ Re-Visit Frequency (Of the 1,219 unique IP addresses)
¡ 73.9% of the users only viewed the site once
¡ 18.7% used it twice
¡ 3.0% used on three different days
Histogram of unique IP re-visit frequency

CONCLUSION
¡ NLP Rankings is a tool to evaluate and identify NLP programs in the United States
¡ Proposing different methods to evaluate NLP programs and researchers

NEXT STEPS
1. Platform running time is relatively short
¡ With a longer time horizon, further analysis can be conducted to reevaluate the usefulness of NLP Rankings platform
¡ Especially during application seasons in the Fall
2. Cluster Analysis andTopic Modeling
¡ Research interest and focus at different universities are also important factors
¡ Identify main NLP research interests at each institution
3. Trend Analysis andTopic Modeling
¡ Identify trending research topics over the past decade

REFERENCES
¡ A. F. M.A.Al-Juboori,Y. Na and F. Ko, "University ranking and evaluation:Trend and existing approaches," The 2nd International Conference
on Next Generation InformationTechnology, Gyeongju, 2011, pp. 137-142.
¡ Hirsch, J. E. (2005),An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences Nov
2005, 102 (46) 16569-16572.
¡ Isidro F.Aguillo, José Luís Ortega & Mario Fernández (2008) Webometric Ranking ofWorld Universities: Introduction, Methodology, and
Future Developments, Higher Education in Europe, 33:2-3, 233-244.
¡ Jin, B., Liang, L., Rousseau, R. et al. The R- and AR-indices: Complementing the h-index. CHINESE SCI BULL 52, 855–863 (2007).
¡ McPherson, Michael A. (2012), Ranking U.S. Economics Programs by Faculty and Graduate Publications:An Update Using 1994–2009
Data. Southern Economic Journal: July 2012,Vol. 79, No. 1, 71– 89.
¡ Morse, Robert.“How U.S. News Calculated the 2021 Best Graduate Schools Rankings.” U.S. News,
https://www.usnews.com/education/best-graduate-schools/articles/how-us-news-calculated-the-rankings.
¡ “NTU Ranking – Indicators.” NTU Ranking, http://nturanking.lis.ntu.edu.tw/methodoloyg/indicators.
¡ “QS World University Rankings – Methodology.” QSWorld University Rankings, https://www.topuniversities.com/qs-world-university-
rankings/methodology.

NLP Rankings: Publication-based Ranking System and Platform for NLP Research

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to NLP Rankings: Publication-based Ranking System and Platform for NLP Research

Similar to NLP Rankings: Publication-based Ranking System and Platform for NLP Research (20)

More from Jinho Choi

More from Jinho Choi (20)

Recently uploaded

Recently uploaded (20)

NLP Rankings: Publication-based Ranking System and Platform for NLP Research