LMSimilarity

abstract class LMSimilarity @JvmOverloads constructor(collectionModel: LMSimilarity.CollectionModel = DefaultCollectionModel(), discountOverlaps: Boolean = true) : SimilarityBase

Abstract superclass for language modeling Similarities. The following inner types are introduced:

  • LMStats, which defines a new statistic, the probability that the collection language model generates the current term;

  • CollectionModel, which is a strategy interface for object that compute the collection language model p(w|C);

  • DefaultCollectionModel, an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens.

Inheritors

Constructors

Link copied to clipboard
constructor(collectionModel: LMSimilarity.CollectionModel = DefaultCollectionModel(), discountOverlaps: Boolean = true)

Types

Link copied to clipboard
interface CollectionModel

A strategy for computing the collection language model.

Link copied to clipboard

Models p(w|C) as the number of occurrences of the term in the collection, divided by the total number of tokens + 1.

Link copied to clipboard
class LMStats(field: String, boost: Double) : BasicStats

Stores the collection distribution of the current term.

Properties

Link copied to clipboard

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Link copied to clipboard
abstract val name: String

Returns the name of the LM method. The values of the parameters should be included as well.

Functions

Link copied to clipboard

Computes the normalization value for a field at index-time.

Link copied to clipboard
open override fun scorer(boost: Float, collectionStats: CollectionStatistics, vararg termStats: TermStatistics): Similarity.SimScorer

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

Link copied to clipboard
open override fun toString(): String

Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.