Ir Presentation

  1. 1. IR 2009/2010<br />Weka Experiments<br />Niek de Moel<br />
  2. 2. Aproach 1/2<br />Create basics<br />Amount of documents for testing (500/1000/2000/4000/8000/1600/full)<br />Define min & max document frequentie<br />“How to select key attributes/features? Best predictor?” <br />
  3. 3. Aproach 2/2<br />Testing<br />Using cross validation.<br />Data:<br />F-Measure (precission / recall)<br />ROC (Receiver operating characteristic)<br />TP vs FP<br />Classifier errors<br />Correctly VS Incorrectly Classified Instances <br />
  4. 4. Results so far<br />Test corpus size does matter in speed, not in results.<br />Min document freq 1-10: No pattern.. Around 35 for corpus 500. Still working on this.<br />Classifying nova- 2 vandaag – netwerk is difficult (lot’s of simularities).<br />Proceeding: <br /><ul><li>Max document ratio.
  5. 5. Does Experiment Environment work?
  6. 6. Class attribute was not numeric...
  7. 7. Are iteration needed?
  8. 8. Compare algorithms.</li>