KhmerAnalyzer

Analyzer for Khmer text.

Tokenizes text into grapheme clusters using GraphemeClusterTokenizer, optionally applies character-level normalization via KhmerNormalizationCharFilter before tokenization, and then reorders characters within each token using CharReorderFilter.

Parameters

normalizationlevel

normalization level: 0 = none, 1 = formally confusable (default), 2 = also informally confusable, 3 = also digit mapping and more

Constructors

Link copied to clipboard
constructor()
constructor(normalizationlevel: Int)
constructor(normalizationlevel: Int, enableStopwords: Boolean, khmerNumber: Boolean)
constructor(normalizationlevel: Int, enableStopwords: Boolean, khmerNumber: Boolean, stopwords: CharArraySet)

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard
open fun getOffsetGap(fieldName: String?): Int
Link copied to clipboard
open fun getPositionIncrementGap(fieldName: String?): Int
Link copied to clipboard
fun normalize(fieldName: String, text: String): BytesRef
Link copied to clipboard
fun tokenStream(fieldName: String, text: String): TokenStream
fun tokenStream(fieldName: String, reader: Reader): TokenStream