StandardAnalyzer

Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

Since

3.1

Constructors

Link copied to clipboard
constructor(stopWords: CharArraySet)
constructor()

Builds an analyzer with no stop words.

constructor(stopwords: Reader)

Builds an analyzer with the stop words from the given reader.

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard

Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then use LengthFilter to remove long tokens. The default is StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH.

Link copied to clipboard
Link copied to clipboard

An immutable stopword set

Link copied to clipboard

Functions

Link copied to clipboard
open override fun close()

Frees persistent resources used by this Analyzer

Link copied to clipboard
open fun getOffsetGap(fieldName: String?): Int

Just like .getPositionIncrementGap, except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.

Link copied to clipboard
open fun getPositionIncrementGap(fieldName: String?): Int

Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is

Link copied to clipboard
fun normalize(fieldName: String, text: String): BytesRef

Normalize a string down to the representation that it would have in the index.

Link copied to clipboard
fun tokenStream(fieldName: String, text: String): TokenStream

Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

fun tokenStream(fieldName: String, reader: Reader): TokenStream

Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.