Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts, but has the limitation that it is carried out in a closed-loop manner, selecting points that will improve an existing model. When there is no model, or the task(s) is even under-defined (such as studying corpora-less phenomena), use of traditional AL is inapplicable. To remedy this, we propose a novel method for model-free AL that focuses on utilising phenomena as desir- able characteristics. We introduce a tool, MOVE, that helps iteratively visualise and refine these characteristics. We show its potential on a real world case-study of a corpus we are developing.
Solving the AL Chicken-and-Egg Corpus and Model Problem
1. Solving the AL Chicken-and-Egg
Corpus and Model Problem:
Model-free Active Learning for
Phenomena-driven Corpus Construction
Dain Kaplan1, Neil Rubens2, Simone Teufel3, Takenobu Tokunaga1
1Tokyo Institute of
Technology
Dept. of Computer Science
3University of Cambridge
Computer Laboratory
2Stanford University
mediaX / H-STAR