common/org.gnit.lucenekmp.analysis.cjk/CJKBigramFilter

CJKBigramFilter

class CJKBigramFilter : TokenFilter

Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJK types are set by these tokenizers, but you can also use CJKBigramFilter to explicitly control which of the CJK scripts are turned into bigrams.

By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you want to always output both unigrams and bigrams, set the outputUnigrams flag in CJKBigramFilter.

Unlike ICUTokenizer, StandardTokenizer does not split at script boundaries. Korean Hangul characters are treated the same as many other scripts' letters, and as a result, StandardTokenizer can produce tokens that mix Hangul and non-Hangul characters, e.g. "한국abc".

In all cases, all non-CJK input is passed thru unmodified.

Constructors

CJKBigramFilter

constructor(input: TokenStream)

Calls CJKBigramFilter with default flags.

constructor(input: TokenStream, flags: Int)

Calls CJKBigramFilter with outputUnigrams set to false.

constructor(input: TokenStream, flags: Int, outputUnigrams: Boolean)

Create a new CJKBigramFilter, specifying which writing systems should be bigrammed, and whether or not unigrams should also be output.

Types

object Companion

Properties

attributeClassesIterator

val attributeClassesIterator: Iterator<Any>

attributeFactory

val attributeFactory: AttributeFactory

attributeImplsIterator

val attributeImplsIterator: Iterator<AttributeImpl>

Functions

fun <T : Attribute> addAttribute(attClass: KClass<T>): T

addAttributeImpl

fun addAttributeImpl(att: AttributeImpl)

fun captureState(): AttributeSource.State?

clearAttributes

fun clearAttributes()

cloneAttributes

fun cloneAttributes(): AttributeSource

open override fun close()

fun copyTo(target: AttributeSource)

open override fun end()

fun endAttributes()

open operator override fun equals(obj: Any?): Boolean

fun <T : Attribute> getAttribute(attClass: KClass<T>): T?

fun hasAttribute(attClass: KClass<out Attribute>): Boolean

fun hasAttributes(): Boolean

open override fun hashCode(): Int

open override fun incrementToken(): Boolean

reflectAsString

fun reflectAsString(prependAttClass: Boolean): String

fun reflectWith(reflector: AttributeReflector)

removeAllAttributes

fun removeAllAttributes()

open override fun reset()

fun restoreState(state: AttributeSource.State?)

open override fun toString(): String

open override fun unwrap(): TokenStream