Pedersen acl2011-business-meeting

871 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
871
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pedersen acl2011-business-meeting

  1. 1. How would I like to see ACL conferences develop and change in the next five years? Ted Pedersen Department of Computer Science University of Minnesota, Duluth http://www.d.umn.edu/~tpederse June 22, 2011
  2. 2. More papers with reproducible results...
  3. 3. Why? <ul><li>If we are going to have highly empirical papers where progress is demonstrated via tables of results, then those results must be reproducible by the reader (and the author) to be believable </li></ul><ul><li>Are we doing science? </li></ul><ul><li>Other benefits... </li></ul><ul><ul><li>Empiricism is not a matter of faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008. </li></ul></ul><ul><li>http://aclweb.org/anthology-new/J/J08/J08-3010.pdf </li></ul>
  4. 4. Great Progress! <ul><li>Replicability a specific criteria in reviews </li></ul><ul><li>Software and data submissions to ACL 2011! </li></ul><ul><ul><li>1,146 submissions : 84 w/software, 117 w/data </li></ul></ul><ul><ul><li>292 accepted : 30 w/software, 35 w/data </li></ul></ul><ul><li>Software / data included in Proceedings USB!! </li></ul><ul><ul><li>4 w/software+data </li></ul></ul><ul><ul><li>13 w/software, 17 w/data </li></ul></ul><ul><ul><li>258/292 = 88% with neither software nor data </li></ul></ul>
  5. 5. Relatively low submission rate for data and code ... <ul><li>Already available, no need to do it? </li></ul><ul><li>Hard to anonymize existing or released code... </li></ul><ul><li>Just can't do it? </li></ul><ul><ul><li>Restrictions on data and code? </li></ul></ul><ul><ul><li>Data and code aren't ready for public display... </li></ul></ul>
  6. 6. Empirical Evaluation... <ul><li>Randomly selected 10 of the 164 long papers </li></ul><ul><ul><li>9 of 10 empirical </li></ul></ul><ul><li>Reviewed papers to determine degree of replicability </li></ul><ul><ul><li>Software available? </li></ul></ul><ul><ul><li>Data available? </li></ul></ul><ul><ul><li>Description self contained and complete? </li></ul></ul>
  7. 7. Replicability (1-5) <ul><li>Will members of the ACL community be able to reproduce or verify the results in this paper? </li></ul><ul><li>5 = could easily reproduce the results. </li></ul><ul><li>4 = could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method. </li></ul><ul><li>3 = could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined; the training/evaluation data are not widely available. </li></ul><ul><li>2 = would be hard pressed to reproduce the results. The contribution depends on data that are simply not available outside the author's institution or consortium; not enough details are provided. </li></ul><ul><li>1 = could not reproduce the results here no matter how hard they tried. </li></ul>
  8. 8. A Table of Results Data? Code? Description? Comparison? Claim Score 3 rd party dist. 3 rd party + ? Complete? self self-improve 3 3 rd party dist. 3 rd party + ? Complete? self self-improve 3 3 rd party dist. No Parameters? self self-Improve 2 Closed No See elsewhere self self-improve 1 Private sharing 3 rd party + ? Complete? self self-improve 2 Shared task No See elsewhere Shared task best ever! 1 Shared task 3 rd party + ? Complete Shared task Lower cost 4 Private sharing No Complete? Pub. results best ever! 1 Private sharing 3 rd party + ? Parameters? Pub. results Improve over 2 N/A N/A Complete Theoretical Improve scope N/A
  9. 9. A Few Generalizations... <ul><li>We use data from 3 rd parties and shared tasks </li></ul><ul><ul><li>Still some private sharing and private data :( </li></ul></ul><ul><ul><li>1 of 10 submitted data (partial) </li></ul></ul><ul><li>We use 3 rd party code as a starting point... </li></ul><ul><ul><li>...but don't provide extensions (3 rd party + ?) :( </li></ul></ul><ul><ul><li>0 of 10 submitted software </li></ul></ul><ul><li>Descriptions are often incomplete </li></ul><ul><ul><li>...and this is why we need software and data </li></ul></ul><ul><li>New Age Empiricism </li></ul><ul><ul><li>Lots of self improvement </li></ul></ul>
  10. 10. Can't anonymize software? <ul><li>Agreed. How anonymous are submissions in the first place? </li></ul><ul><ul><li>Web searches, plagiarism detectors, etc. often reveal authors anyway </li></ul></ul><ul><ul><li>We expand on ground breaking work by Zigglebottom, 1999... </li></ul></ul><ul><ul><ul><li>(thus spake Zigglebottom) </li></ul></ul></ul><ul><li>Drop blind submissions </li></ul><ul><ul><li>Improving Our Reviewing Process (Mani) Computational Linguistics, Volume 37, Number 1, March 2011. </li></ul></ul><ul><ul><ul><li>(related, e.g., advocates signed reviews) </li></ul></ul></ul>
  11. 11. Expect More. Reward More. <ul><li>Weight replicability higher for accept/reject decisions and best paper awards. </li></ul><ul><li>Drop blind submissions, enable more transparent review of papers and software/data. </li></ul><ul><li>Continue initiatives to encourage submission of software /data and enable distribution </li></ul><ul><ul><li>Nice work ACL 2011! </li></ul></ul><ul><li>Be careful of domains where data is by definition not sharable </li></ul>

×