21. 今日の話は以下の話ともoverlapあり
[Sakai16SIGIR] Sakai, T.: Statistical Significance, Power, and Sample
Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015, Proceedings
of ACM SIGIR 2016, pp.5-14, 2016.
http://www.slideshare.net/TetsuyaSakai/sigir2016
[Sakai16ICTIRtutorial] Sakai, T.: Topic Set Size Design and Power
Analysis in Practice (Tutorial Abstract), ACM ICTIR 2016, pp.9-10, 2016.
http://www.slideshare.net/TetsuyaSakai/ictir2016tutorial-65845256
(スライド200ページ!)
52. 古典的検定は70年代から「炎上」している
[Johnson99]
• Deming (1975) commented that the reason students have problems
understanding hypothesis tests is that they may be trying to think.
• Carver (1978) recommended that statistical significance testing
should be eliminated; it is not only useless, it is also harmful because
it is interpreted to mean something else.
• Cohen (1994:997) noted that statistical testing of the null hypothesis
"does not tell us what we want to know, and we so much want to
know what we want to know that, out of desperation, we
nevertheless believe that it does!"
89. 文献(酒井)
[Sakai06SIGIR] Sakai, T.: Evaluating Evaluation Metrics based on the Bootstrap, ACM SIGIR 2006, pp.525-532.
[Sakai07SIGIR] Sakai, T.: Alternatives to Bpref, ACM SIGIR 2007, pp.71-78, July 2007.
[Sakai+11CIKM] Sakai, T., Kato, M.P. and Song, Y.-I.: Click the Search Button and Be Happy: Evaluating Direct and Immediate
Information Access, ACM CIKM 2011, pp.621-630.
[Sakai+11SIGIR] Sakai, T. and Song, R., Evaluating Diversified Search Results Using Per-Intent Graded Relevance, ACM SIGIR 2011,
pp.1043-1052.
[Sakai12WWW] Sakai, T.: Evaluation with Informational and Navigational Intents, WWW 2012, pp.499-508.
[Sakai13IRJ] Sakai, T. and Song, R.: Diversified Search Evaluation: Lessons from the NTCIR-9 INTENT Task, Information Retrieval, 16(4),
pp.504-529, Springer, 2013.
[Sakai+13SIGIR] Sakai, T., Dou, Z.: Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation,
ACM SIGIR 2013, pp.473-482.
[Sakai15book] 酒井哲也: 情報アクセス評価方法論: 検索エンジンの進歩のために, コロナ社, 2015.
[Sakai16IRJ] Sakai, T.: Topic Set Size Design, Information Retrieval Journal, 19(3), pp. 256-283, Springer, 2016.
http://link.springer.com/content/pdf/10.1007%2Fs10791-015-9273-z.pdf (open access)
[Sakai16SIGIR] Sakai, T.: Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015, ACM
SIGIR 2016, pp.5-14.
[Sakai16SIGIRshort] Sakai, T.: Two Sample T-tests for IR Evaluation: Student or Welch?, Proceedings of ACM SIGIR 2016, pp.1045-1048.
[Sakai16ICTIRtutorial] Sakai, T.: Topic Set Size Design and Power Analysis in Practice (Tutorial Abstract), ACM ICTIR 2016, pp.9-10.
90. 文献(その他)
[Cohen88] Cohen. J.: Statistical Power Analysis for the Behavioral
Sciences (Second Edition), Psychology Press, 1988.
[Ellis10] Ellis, P. D.: The Essential Guide to Effect Sizes, Cambridge, 2010.
[Johnson99] Johnson, D. H.: The Insignificance of Statistical Significance
Testing, Journal of Wildlife Management, 63(3), 1999.
[永田03] 永田靖: サンプルサイズの決め方, 朝倉書店, 2003.
[豊田09] 豊田秀樹: 検定力分析入門, 東京図書, 2009.