IndriDirichletSimilarity
Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!
tf_E + mu*P(t|D) P(t|E)= documentLength + documentMu mu*P(t|C) + tf_D where P(t|D)= doclen + mu*
A larger value for mu, produces more smoothing. Smoothing is most important for short documents where the probabilities are more granular.
Constructors
Link copied to clipboard
Instantiates the similarity with the provided parameters.
constructor(collectionModel: LMSimilarity.CollectionModel = IndriCollectionModel(), mu: Float = 2000.0f)
Instantiates the similarity with the default value of 2000.
Instantiates the similarity with the provided parameter.
Types
Properties
Functions
Link copied to clipboard
Computes the normalization value for a field at index-time.
Link copied to clipboard
open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.