KoreanTokenizer
Tokenizer for Korean that uses morphological analysis.
Constructors
Link copied to clipboard
constructor()
Creates a new KoreanTokenizer with default parameters.
constructor(factory: AttributeFactory, userDictionary: UserDictionary?, mode: KoreanTokenizer.DecompoundMode, outputUnknownUnigrams: Boolean)
Create a new KoreanTokenizer using the system and unknown dictionaries shipped with Lucene.
constructor(factory: AttributeFactory, userDictionary: UserDictionary?, mode: KoreanTokenizer.DecompoundMode, outputUnknownUnigrams: Boolean, discardPunctuation: Boolean)
Create a new KoreanTokenizer using the system and unknown dictionaries shipped with Lucene.
constructor(factory: AttributeFactory, systemDictionary: TokenInfoDictionary, unkDictionary: UnknownDictionary, connectionCosts: ConnectionCosts, userDictionary: UserDictionary?, mode: KoreanTokenizer.DecompoundMode, outputUnknownUnigrams: Boolean, discardPunctuation: Boolean)
Create a new KoreanTokenizer supplying a custom system dictionary and unknown dictionary.
Types
Link copied to clipboard
Decompound mode: this determines how the tokenizer handles POS.Type.COMPOUND, POS.Type.INFLECT and POS.Type.PREANALYSIS tokens.
Functions
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Expert: set this to produce graphviz (dot) output of the Viterbi lattice