core/org.gnit.lucenekmp.search.similarities/LMDirichletSimilarity

LMDirichletSimilarity

class LMDirichletSimilarity : LMSimilarity

Bayesian smoothing using Dirichlet priors. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342.

The formula as defined the paper assigns a negative score to documents that contain the term, but with fewer occurrences than predicted by the collection language model. The Lucene implementation returns 0 for such documents.

Constructors

constructor(collectionModel: LMSimilarity.CollectionModel, mu: Float = 2000.0f)

Instantiates the similarity with the default value of 2000.

constructor(collectionModel: LMSimilarity.CollectionModel, discountOverlaps: Boolean, mu: Float)

Instantiates the similarity with the provided parameters.

constructor(mu: Float = 2000.0f)

Instantiates the similarity with the default value of 2000.

Properties

discountOverlaps

val discountOverlaps: Boolean

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

val mu: Float

The parameter.

name

open override val name: String

Returns the name of the LM method. The values of the parameters should be included as well.

Functions

computeNorm

open fun computeNorm(state: FieldInvertState): Long

Computes the normalization value for a field at index-time.

scorer

open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

toString

open override fun toString(): String

Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.