LMSimilarity
Abstract superclass for language modeling Similarities. The following inner types are introduced:
LMStats, which defines a new statistic, the probability that the collection language model generates the current term;
CollectionModel, which is a strategy interface for object that compute the collection language model
p(w|C);DefaultCollectionModel, an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens.
Inheritors
Constructors
Types
A strategy for computing the collection language model.
Models p(w|C) as the number of occurrences of the term in the collection, divided by the total number of tokens + 1.
Stores the collection distribution of the current term.
Properties
Functions
Computes the normalization value for a field at index-time.
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.