Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ir Presentation

  • Be the first to comment

  • Be the first to like this

Ir Presentation

  1. 1. IR 2009/2010<br />Weka Experiments<br />Niek de Moel<br />
  2. 2. Aproach 1/2<br />Create basics<br />Amount of documents for testing (500/1000/2000/4000/8000/1600/full)<br />Define min & max document frequentie<br />“How to select key attributes/features? Best predictor?” <br />
  3. 3. Aproach 2/2<br />Testing<br />Using cross validation.<br />Data:<br />F-Measure (precission / recall)<br />ROC (Receiver operating characteristic)<br />TP vs FP<br />Classifier errors<br />Correctly VS Incorrectly Classified Instances <br />
  4. 4. Results so far<br />Test corpus size does matter in speed, not in results.<br />Min document freq 1-10: No pattern.. Around 35 for corpus 500. Still working on this.<br />Classifying nova- 2 vandaag – netwerk is difficult (lot’s of simularities).<br />Proceeding: <br /><ul><li>Max document ratio.
  5. 5. Does Experiment Environment work?
  6. 6. Class attribute was not numeric...
  7. 7. Are iteration needed?
  8. 8. Compare algorithms.</li>