UnicodeWhitespaceTokenizer

A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens (according to Unicode's WHITESPACE property).

Constructors

Link copied to clipboard
constructor()

Construct a new UnicodeWhitespaceTokenizer.

constructor(factory: AttributeFactory)

Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.

constructor(factory: AttributeFactory, maxTokenLen: Int)

Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.

Types

Link copied to clipboard
object Companion

Properties

Functions

Link copied to clipboard
fun <T : Attribute> addAttribute(attClass: KClass<T>): T
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun close()
Link copied to clipboard
fun copyTo(target: AttributeSource)
Link copied to clipboard
open override fun end()
Link copied to clipboard
Link copied to clipboard
open operator override fun equals(obj: Any?): Boolean
Link copied to clipboard
fun <T : Attribute> getAttribute(attClass: KClass<T>): T?
Link copied to clipboard
fun hasAttribute(attClass: KClass<out Attribute>): Boolean
Link copied to clipboard
Link copied to clipboard
open override fun hashCode(): Int
Link copied to clipboard
open override fun incrementToken(): Boolean
Link copied to clipboard
fun reflectAsString(prependAttClass: Boolean): String
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun reset()
Link copied to clipboard
Link copied to clipboard
fun setReader(input: Reader)
Link copied to clipboard
open override fun toString(): String