Companion
object Companion
Properties
Link copied to clipboard
Link copied to clipboard
Email token type
Link copied to clipboard
Emoji token type
Link copied to clipboard
Hangul token type
Link copied to clipboard
Hiragana token type
Link copied to clipboard
Ideographic token type
Link copied to clipboard
Katakana token type
Link copied to clipboard
Numbers
Link copied to clipboard
Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.