core/org.gnit.lucenekmp.analysis/Analyzer

Analyzer

An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

In order to define what analysis is done, subclasses must define their [ ] in .createComponents. The components are then reused in each call to .tokenStream.

Simple example:

Analyzer analyzer = new Analyzer() {
Since
3.1Inheritors
AnalyzerWrapper
StopwordAnalyzerBase

Types

Companion

object Companion

ReuseStrategy

abstract class ReuseStrategy

Strategy defining how TokenStreamComponents are reused per call to .

TokenStreamComponents

open class TokenStreamComponents(source: Consumer<Reader>, result: TokenStream)

This class encapsulates the outer components of a token stream. It provides access to the source (a Reader and the outer end (sink), an instance of [ ] which also serves as the TokenStream returned by .

Properties

reuseStrategy

val reuseStrategy: Analyzer.ReuseStrategy

Returns the used ReuseStrategy.

storedValue

var storedValue: CloseableThreadLocal<Any>?

Functions

open override fun close()

Frees persistent resources used by this Analyzer

getOffsetGap

open fun getOffsetGap(fieldName: String?): Int

Just like .getPositionIncrementGap, except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.

getPositionIncrementGap

open fun getPositionIncrementGap(fieldName: String?): Int

Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is

normalize

fun normalize(fieldName: String, text: String): BytesRef

Normalize a string down to the representation that it would have in the index.

tokenStream

fun tokenStream(fieldName: String, text: String): TokenStream

Returns a TokenStream suitable for fieldName, tokenizing the contents of text.

fun tokenStream(fieldName: String, reader: Reader): TokenStream

Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.