LMDirichletSimilarity

Bayesian smoothing using Dirichlet priors. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342.

The formula as defined the paper assigns a negative score to documents that contain the term, but with fewer occurrences than predicted by the collection language model. The Lucene implementation returns 0 for such documents.

Constructors

Link copied to clipboard
constructor(collectionModel: LMSimilarity.CollectionModel, mu: Float = 2000.0f)

Instantiates the similarity with the default value of 2000.

constructor(collectionModel: LMSimilarity.CollectionModel, discountOverlaps: Boolean, mu: Float)

Instantiates the similarity with the provided parameters.

constructor(mu: Float = 2000.0f)

Instantiates the similarity with the default value of 2000.

Properties

Link copied to clipboard

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Link copied to clipboard
val mu: Float

The parameter.

Link copied to clipboard
open override val name: String

Returns the name of the LM method. The values of the parameters should be included as well.

Functions

Link copied to clipboard

Computes the normalization value for a field at index-time.

Link copied to clipboard
open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

Link copied to clipboard
open override fun toString(): String

Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.