DFISimilarity
Implements the Divergence from Independence (DFI) model based on Chi-square statistics (i.e., standardized Chi-squared distance from independence in term frequency tf).
DFI is both parameter-free and non-parametric:
parameter-free: it does not require any parameter tuning or training.
non-parametric: it does not make any assumptions about word frequency distributions on document collections.
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
See also
Properties
Functions
Computes the normalization value for a field at index-time.
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.