IndriDirichletSimilarity

Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!

tf_E + mu*P(t|D) P(t|E)= documentLength + documentMu
mu*P(t|C) + tf_D where P(t|D)= doclen + mu
*

A larger value for mu, produces more smoothing. Smoothing is most important for short documents where the probabilities are more granular.

Constructors

Link copied to clipboard
constructor(collectionModel: LMSimilarity.CollectionModel, discountOverlaps: Boolean, mu: Float)

Instantiates the similarity with the provided parameters.

constructor(collectionModel: LMSimilarity.CollectionModel = IndriCollectionModel(), mu: Float = 2000.0f)

Instantiates the similarity with the default value of 2000.

constructor(mu: Float)

Instantiates the similarity with the provided parameter.

Types

Link copied to clipboard

Models p(w|C) as the number of occurrences of the term in the collection, divided by the total number of tokens + 1.

Properties

Link copied to clipboard

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Link copied to clipboard
val mu: Float

The parameter.

Link copied to clipboard
open override val name: String

Returns the name of the LM method. The values of the parameters should be included as well.

Functions

Link copied to clipboard

Computes the normalization value for a field at index-time.

Link copied to clipboard
open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

Link copied to clipboard
open override fun toString(): String

Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.