PersianAnalyzer

Analyzer for Persian.

This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

Constructors

Link copied to clipboard
constructor(stopwords: CharArraySet, stemExclusionSet: CharArraySet)

Builds an analyzer with the given stop word and stem exclusion set.

constructor(stopwords: CharArraySet)

Builds an analyzer with the given stop words.

constructor()

Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard
open fun getOffsetGap(fieldName: String?): Int
Link copied to clipboard
open fun getPositionIncrementGap(fieldName: String?): Int
Link copied to clipboard
fun normalize(fieldName: String, text: String): BytesRef
Link copied to clipboard
fun tokenStream(fieldName: String, text: String): TokenStream
fun tokenStream(fieldName: String, reader: Reader): TokenStream