NGramTokenFilter
class NGramTokenFilter(input: TokenStream, minGram: Int, maxGram: Int, preserveOriginal: Boolean) : TokenFilter
Tokenizes the input into n-grams of the given size(s). As of Lucene 4.4, this token filter:
- handles supplementary characters correctly,
- emits all n-grams for the same token at the same position,
- does not modify offsets,
- sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").
If you were using this [TokenFilter] to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use [NGramTokenizer], and potentially override [NGramTokenizer.isTokenChar] to perform pre-tokenization.
Functions
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard