GraphemeClusterTokenizer

Tokenizes a string in Khmer grapheme clusters (not phonetic syllables), for instance: "ខ្ញុំចង់ធ្វើការ" will be tokenized as "ខ្ញុំ", "ច", "ង់", "ធ្វើ", "កា", "រ", not "ខ្ញុំ", "ចង់", "ធ្វើ", "ការ". It uses a simple state machine to do so.

Constructors

Link copied to clipboard
constructor()

Properties

Link copied to clipboard
Link copied to clipboard
val CHCAT_BASE: Int = 1
Link copied to clipboard
val CHCAT_COENG: Int = 3
Link copied to clipboard
val CHCAT_DIGIT: Int = 4
Link copied to clipboard
val CHCAT_IGNORE: Int = 5
Link copied to clipboard
val CHCAT_INSIDE: Int = 2
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
val ST_INIT: Int = 4
Link copied to clipboard
val ST_INSIDESYL: Int = 1

Functions

Link copied to clipboard
fun <T : Attribute> addAttribute(attClass: KClass<T>): T
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
fun category(c: Int): Int
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun close()
Link copied to clipboard
fun copyTo(target: AttributeSource)
Link copied to clipboard
open override fun end()
Link copied to clipboard
Link copied to clipboard
open operator override fun equals(obj: Any?): Boolean
Link copied to clipboard
fun <T : Attribute> getAttribute(attClass: KClass<T>): T?
Link copied to clipboard
fun hasAttribute(attClass: KClass<out Attribute>): Boolean
Link copied to clipboard
Link copied to clipboard
open override fun hashCode(): Int
Link copied to clipboard
open override fun incrementToken(): Boolean
Link copied to clipboard
fun reflectAsString(prependAttClass: Boolean): String
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun reset()
Link copied to clipboard
Link copied to clipboard
fun setReader(input: Reader)
Link copied to clipboard
open override fun toString(): String