ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token.
For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".
This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0.
Constructors
Constructs a ShingleFilter with the specified shingle size from the TokenStream input
Constructs a ShingleFilter with the specified shingle size from the TokenStream input
Construct a ShingleFilter with default shingle size: 2.
Construct a ShingleFilter with the specified token type for shingle tokens and the default shingle size: 2
Properties
Functions
Sets the string to insert for each position at which there is no token (i.e., when position increment is greater than one).
Set the max shingle size (default: 2)
Set the min shingle size (default: 2).
Shall the output stream contain the input tokens (unigrams) as well as shingles? (default: true.)
Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)
Sets the string to use when joining adjacent tokens to form a shingle
Set the type of the shingle tokens produced by this filter. (default: "shingle")