Package-level declarations

Types

Link copied to clipboard
class DecompoundToken(posTag: POS.Tag, surfaceForm: String, startOffset: Int, endOffset: Int, type: TokenType) : Token

A token that was generated from a compound.

Link copied to clipboard
class DictionaryToken(type: TokenType, morphAtts: KoMorphData, wordId: Int, surfaceForm: CharArray, offset: Int, length: Int, startOffset: Int, endOffset: Int) : Token

A token stored in a KoMorphData.

Link copied to clipboard
class KoreanAnalyzer(userDict: UserDictionary? = null, mode: KoreanTokenizer.DecompoundMode = KoreanTokenizer.DEFAULT_DECOMPOUND, stopTags: Set<POS.Tag> = KoreanPartOfSpeechStopFilter.DEFAULT_STOP_TAGS, outputUnknownUnigrams: Boolean = false) : Analyzer

Analyzer for Korean that uses morphological analysis.

Link copied to clipboard

A TokenFilter that normalizes Korean numbers to regular Arabic decimal numbers in half-width characters.

Link copied to clipboard
class KoreanPartOfSpeechStopFilter(input: TokenStream, stopTags: Set<POS.Tag> = DEFAULT_STOP_TAGS) : FilteringTokenFilter

Removes tokens that match a set of part-of-speech tags.

Link copied to clipboard

Replaces term text with the ReadingAttribute which is the Hangul transcription of Hanja characters.

Link copied to clipboard

Tokenizer for Korean that uses morphological analysis.

Link copied to clipboard
class POS

Part of speech classification for Korean based on Sejong corpus classification.

Link copied to clipboard
abstract class Token(surfaceForm: CharArray, offset: Int, length: Int, startOffset: Int, endOffset: Int, type: TokenType) : Token

Analyzed token with morphological data.