common/org.gnit.lucenekmp.analysis.pattern/PatternTokenizer

PatternTokenizer

class PatternTokenizer : Tokenizer

This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): [String.split]

Using group >= 0 selects the matching group as the token. For example, if you have:

pattern = \'([^\']+)\'
group = 0
input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

See also

Constructors

PatternTokenizer

constructor(pattern: Regex, group: Int)

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

constructor(factory: AttributeFactory, pattern: Regex, group: Int)

creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

Properties

attributeClassesIterator

val attributeClassesIterator: Iterator<Any>

attributeFactory

val attributeFactory: AttributeFactory

attributeImplsIterator

val attributeImplsIterator: Iterator<AttributeImpl>

Functions

fun <T : Attribute> addAttribute(attClass: KClass<T>): T

addAttributeImpl

fun addAttributeImpl(att: AttributeImpl)

fun captureState(): AttributeSource.State?

clearAttributes

fun clearAttributes()

cloneAttributes

fun cloneAttributes(): AttributeSource

open override fun close()

fun copyTo(target: AttributeSource)

open override fun end()

fun endAttributes()

open operator override fun equals(obj: Any?): Boolean

fun <T : Attribute> getAttribute(attClass: KClass<T>): T?

fun hasAttribute(attClass: KClass<out Attribute>): Boolean

fun hasAttributes(): Boolean

open override fun hashCode(): Int

open override fun incrementToken(): Boolean

reflectAsString

fun reflectAsString(prependAttClass: Boolean): String

fun reflectWith(reflector: AttributeReflector)

removeAllAttributes

fun removeAllAttributes()

open override fun reset()

fun restoreState(state: AttributeSource.State?)

fun setReader(input: Reader)

open override fun toString(): String