common/org.gnit.lucenekmp.analysis.fa/PersianAnalyzer

PersianAnalyzer

class PersianAnalyzer : StopwordAnalyzerBase

Analyzer for Persian.

This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

Constructors

PersianAnalyzer

constructor(stopwords: CharArraySet, stemExclusionSet: CharArraySet)

Builds an analyzer with the given stop word and stem exclusion set.

constructor(stopwords: CharArraySet)

Builds an analyzer with the given stop words.

constructor()

Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.

Types

Companion

object Companion

Properties

reuseStrategy

val reuseStrategy: Analyzer.ReuseStrategy

stopwords

val stopwords: CharArraySet

storedValue

var storedValue: CloseableThreadLocal<Any>?

Functions

open override fun close()

getOffsetGap

open fun getOffsetGap(fieldName: String?): Int

getPositionIncrementGap

open fun getPositionIncrementGap(fieldName: String?): Int

normalize

fun normalize(fieldName: String, text: String): BytesRef

tokenStream

fun tokenStream(fieldName: String, text: String): TokenStream

fun tokenStream(fieldName: String, reader: Reader): TokenStream