CommonGramsFilter

class CommonGramsFilter(input: TokenStream, commonWords: CharArraySet?) : TokenFilter

Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This is achieved through the use of PositionIncrementAttribute.setPositionIncrement. Bigrams have a type of GRAM_TYPE Example:

  • input:"the quick brown fox"
  • output:|"the","the-quick"|"brown"|"fox"|
  • "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"

Constructors

Link copied to clipboard
constructor(input: TokenStream, commonWords: CharArraySet?)

Types

Link copied to clipboard
object Companion

Properties

Functions

Link copied to clipboard
fun <T : Attribute> addAttribute(attClass: KClass<T>): T
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun close()
Link copied to clipboard
fun copyTo(target: AttributeSource)
Link copied to clipboard
open override fun end()
Link copied to clipboard
Link copied to clipboard
open operator override fun equals(obj: Any?): Boolean
Link copied to clipboard
fun <T : Attribute> getAttribute(attClass: KClass<T>): T?
Link copied to clipboard
fun hasAttribute(attClass: KClass<out Attribute>): Boolean
Link copied to clipboard
Link copied to clipboard
open override fun hashCode(): Int
Link copied to clipboard
open override fun incrementToken(): Boolean

Inserts bigrams for common words into a token stream. For each input token, output the token. If the token and/or the following token are in the list of common words also output a bigram with position increment 0 and type="gram"

Link copied to clipboard
fun reflectAsString(prependAttClass: Boolean): String
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun reset()
Link copied to clipboard
Link copied to clipboard
open override fun toString(): String
Link copied to clipboard
open override fun unwrap(): TokenStream