IBSimilarity
Provides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = Σ -xqw log Prob(Xw ≥ tdw | λw), where
xqw is the query boost;
Xw is a random variable that counts the occurrences of word w;
tdw is the normalized term frequency;
λw is a parameter.
The framework described in the paper has many similarities to the DFR framework (see DFRSimilarity). It is possible that the two Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution: Probabilistic distribution used to model term occurrence
Log-logistic: Smoothed power-law
Lambda: λw parameter of the probability distribution
LambdaDF: Nw/N or average number of documents where w occurs
LambdaTTF: Fw/N or average number of occurrences of w in the collection
Normalization: Term frequency normalization
Any supported DFR normalization (listed in DFRSimilarity)
See also
Constructors
Properties
Functions
Computes the normalization value for a field at index-time.
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.