Pedersen acl2011-business-meeting


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pedersen acl2011-business-meeting

  1. 1. How would I like to see ACL conferences develop and change in the next five years? Ted Pedersen Department of Computer Science University of Minnesota, Duluth June 22, 2011
  2. 2. More papers with reproducible results...
  3. 3. Why? <ul><li>If we are going to have highly empirical papers where progress is demonstrated via tables of results, then those results must be reproducible by the reader (and the author) to be believable </li></ul><ul><li>Are we doing science? </li></ul><ul><li>Other benefits... </li></ul><ul><ul><li>Empiricism is not a matter of faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008. </li></ul></ul><ul><li> </li></ul>
  4. 4. Great Progress! <ul><li>Replicability a specific criteria in reviews </li></ul><ul><li>Software and data submissions to ACL 2011! </li></ul><ul><ul><li>1,146 submissions : 84 w/software, 117 w/data </li></ul></ul><ul><ul><li>292 accepted : 30 w/software, 35 w/data </li></ul></ul><ul><li>Software / data included in Proceedings USB!! </li></ul><ul><ul><li>4 w/software+data </li></ul></ul><ul><ul><li>13 w/software, 17 w/data </li></ul></ul><ul><ul><li>258/292 = 88% with neither software nor data </li></ul></ul>
  5. 5. Relatively low submission rate for data and code ... <ul><li>Already available, no need to do it? </li></ul><ul><li>Hard to anonymize existing or released code... </li></ul><ul><li>Just can't do it? </li></ul><ul><ul><li>Restrictions on data and code? </li></ul></ul><ul><ul><li>Data and code aren't ready for public display... </li></ul></ul>
  6. 6. Empirical Evaluation... <ul><li>Randomly selected 10 of the 164 long papers </li></ul><ul><ul><li>9 of 10 empirical </li></ul></ul><ul><li>Reviewed papers to determine degree of replicability </li></ul><ul><ul><li>Software available? </li></ul></ul><ul><ul><li>Data available? </li></ul></ul><ul><ul><li>Description self contained and complete? </li></ul></ul>
  7. 7. Replicability (1-5) <ul><li>Will members of the ACL community be able to reproduce or verify the results in this paper? </li></ul><ul><li>5 = could easily reproduce the results. </li></ul><ul><li>4 = could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method. </li></ul><ul><li>3 = could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined; the training/evaluation data are not widely available. </li></ul><ul><li>2 = would be hard pressed to reproduce the results. The contribution depends on data that are simply not available outside the author's institution or consortium; not enough details are provided. </li></ul><ul><li>1 = could not reproduce the results here no matter how hard they tried. </li></ul>
  8. 8. A Table of Results Data? Code? Description? Comparison? Claim Score 3 rd party dist. 3 rd party + ? Complete? self self-improve 3 3 rd party dist. 3 rd party + ? Complete? self self-improve 3 3 rd party dist. No Parameters? self self-Improve 2 Closed No See elsewhere self self-improve 1 Private sharing 3 rd party + ? Complete? self self-improve 2 Shared task No See elsewhere Shared task best ever! 1 Shared task 3 rd party + ? Complete Shared task Lower cost 4 Private sharing No Complete? Pub. results best ever! 1 Private sharing 3 rd party + ? Parameters? Pub. results Improve over 2 N/A N/A Complete Theoretical Improve scope N/A
  9. 9. A Few Generalizations... <ul><li>We use data from 3 rd parties and shared tasks </li></ul><ul><ul><li>Still some private sharing and private data :( </li></ul></ul><ul><ul><li>1 of 10 submitted data (partial) </li></ul></ul><ul><li>We use 3 rd party code as a starting point... </li></ul><ul><ul><li>...but don't provide extensions (3 rd party + ?) :( </li></ul></ul><ul><ul><li>0 of 10 submitted software </li></ul></ul><ul><li>Descriptions are often incomplete </li></ul><ul><ul><li>...and this is why we need software and data </li></ul></ul><ul><li>New Age Empiricism </li></ul><ul><ul><li>Lots of self improvement </li></ul></ul>
  10. 10. Can't anonymize software? <ul><li>Agreed. How anonymous are submissions in the first place? </li></ul><ul><ul><li>Web searches, plagiarism detectors, etc. often reveal authors anyway </li></ul></ul><ul><ul><li>We expand on ground breaking work by Zigglebottom, 1999... </li></ul></ul><ul><ul><ul><li>(thus spake Zigglebottom) </li></ul></ul></ul><ul><li>Drop blind submissions </li></ul><ul><ul><li>Improving Our Reviewing Process (Mani) Computational Linguistics, Volume 37, Number 1, March 2011. </li></ul></ul><ul><ul><ul><li>(related, e.g., advocates signed reviews) </li></ul></ul></ul>
  11. 11. Expect More. Reward More. <ul><li>Weight replicability higher for accept/reject decisions and best paper awards. </li></ul><ul><li>Drop blind submissions, enable more transparent review of papers and software/data. </li></ul><ul><li>Continue initiatives to encourage submission of software /data and enable distribution </li></ul><ul><ul><li>Nice work ACL 2011! </li></ul></ul><ul><li>Be careful of domains where data is by definition not sharable </li></ul>