core/org.gnit.lucenekmp.search.similarities/DFISimilarity

DFISimilarity

class DFISimilarity(val independence: Independence, discountOverlaps: Boolean = true) : SimilarityBase

Implements the Divergence from Independence (DFI) model based on Chi-square statistics (i.e., standardized Chi-squared distance from independence in term frequency tf).

DFI is both parameter-free and non-parametric:

parameter-free: it does not require any parameter tuning or training.
non-parametric: it does not make any assumptions about word frequency distributions on document collections.

It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.

For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

Constructors

DFISimilarity

constructor(independence: Independence, discountOverlaps: Boolean = true)

Properties

discountOverlaps

val discountOverlaps: Boolean

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

independence

val independence: Independence

Functions

computeNorm

open fun computeNorm(state: FieldInvertState): Long

Computes the normalization value for a field at index-time.

scorer

open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

toString

open override fun toString(): String

Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

DFISimilarity

See also

Constructors

Properties

Functions