Package-level declarations

Types

Link copied to clipboard

Analyzer for Japanese that uses morphological analysis.

Link copied to clipboard

Replaces term text with the BaseFormAttribute.

Link copied to clipboard

Factory for {@link org.gnit.lucenekmp.analysis.ja.JapaneseBaseFormFilter}.

Link copied to clipboard

Analyzer for Japanese completion suggester.

Link copied to clipboard

A TokenFilter that adds Japanese romanized tokens to the term attribute. Also keeps original tokens (surface forms). Main usage is query auto-completion.

Link copied to clipboard

Factory for {@link JapaneseCompletionFilter}.

Link copied to clipboard

A TokenFilter that normalizes small letters (捨て仮名) in hiragana into normal letters. For instance, "ちょっとまって" will be translated to "ちよつとまつて".

Link copied to clipboard

Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

Link copied to clipboard

A TokenFilter that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).

Link copied to clipboard

A TokenFilter that normalizes small letters (捨て仮名) in katakana into normal letters. For instance, "ストップウォッチ" will be translated to "ストツプウオツチ".

Link copied to clipboard

A TokenFilter that normalizes Japanese numbers (kansūji) to regular Arabic decimal numbers.

Link copied to clipboard

Removes tokens that match a set of part-of-speech tags.

Link copied to clipboard

Factory for {@link org.gnit.lucenekmp.analysis.ja.JapanesePartOfSpeechStopFilter}.

Link copied to clipboard
class JapaneseReadingFormFilter(input: TokenStream, useRomaji: Boolean = false) : TokenFilter

A TokenFilter that replaces the term attribute with the reading of a token in either katakana or romaji form. The default reading form is katakana.

Link copied to clipboard

Factory for {@link org.gnit.lucenekmp.analysis.ja.JapaneseReadingFormFilter}.

Link copied to clipboard

Tokenizer for Japanese that uses morphological analysis.

Link copied to clipboard

Factory for {@link org.gnit.lucenekmp.analysis.ja.JapaneseTokenizer}.

Link copied to clipboard
class Token(surfaceForm: CharArray, offset: Int, length: Int, startOffset: Int, endOffset: Int, morphId: Int, type: TokenType, morphData: JaMorphData) : Token

Analyzed token with morphological data from its dictionary.