This document discusses improvements to the widely used TF-IDF term weighting algorithm for natural language processing systems.
It notes that TF-IDF has drawbacks because term frequency alone is not sufficient to indicate importance, and it does not consider the distribution of terms within documents. The paper proposes new local and global term weighting algorithms based on term distribution, using the Pearson Chi-square Test Statistic.
An experiment is described that applies the new algorithms to an LSA-based information retrieval system and text classifier to evaluate their reliability and efficiency compared to TF-IDF.