NGramTokenFilter

class NGramTokenFilter(input: TokenStream, minGram: Int, maxGram: Int, preserveOriginal: Boolean) : TokenFilter

Tokenizes the input into n-grams of the given size(s). As of Lucene 4.4, this token filter:

  • handles supplementary characters correctly,
  • emits all n-grams for the same token at the same position,
  • does not modify offsets,
  • sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

If you were using this [TokenFilter] to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use [NGramTokenizer], and potentially override [NGramTokenizer.isTokenChar] to perform pre-tokenization.

Constructors

Link copied to clipboard
constructor(input: TokenStream, minGram: Int, maxGram: Int, preserveOriginal: Boolean)
constructor(input: TokenStream, gramSize: Int)

Types

Link copied to clipboard
object Companion

Properties

Functions

Link copied to clipboard
fun <T : Attribute> addAttribute(attClass: KClass<T>): T
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun close()
Link copied to clipboard
fun copyTo(target: AttributeSource)
Link copied to clipboard
open override fun end()
Link copied to clipboard
Link copied to clipboard
open operator override fun equals(obj: Any?): Boolean
Link copied to clipboard
fun <T : Attribute> getAttribute(attClass: KClass<T>): T?
Link copied to clipboard
fun hasAttribute(attClass: KClass<out Attribute>): Boolean
Link copied to clipboard
Link copied to clipboard
open override fun hashCode(): Int
Link copied to clipboard
override fun incrementToken(): Boolean
Link copied to clipboard
fun reflectAsString(prependAttClass: Boolean): String
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun reset()
Link copied to clipboard
Link copied to clipboard
open override fun toString(): String
Link copied to clipboard
open override fun unwrap(): TokenStream