core/org.gnit.lucenekmp.analysis

Package-level declarations

Types

AbstractAnalysisFactory

abstract class AbstractAnalysisFactory

Abstract parent class for analysis factories TokenizerFactory, TokenFilterFactory and CharFilterFactory.

AnalysisSPILoader

class AnalysisSPILoader<S : AbstractAnalysisFactory>(clazz: KClass<S>, classloader: ClassLoader? = null)

Helper class for loading named SPIs from classpath (e.g. Tokenizers, TokenStreams).

AnalysisSPIReflection

expect object AnalysisSPIReflection

AnalysisSPIRegistry

object AnalysisSPIRegistry

Analyzer

abstract class Analyzer : AutoCloseable

An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

AnalyzerWrapper

abstract class AnalyzerWrapper : Analyzer

Extension to Analyzer suitable for Analyzers which wrap other Analyzers.

AutomatonToTokenStream

class AutomatonToTokenStream

Converts an Automaton into a TokenStream.

CachingTokenFilter

class CachingTokenFilter(input: TokenStream) : TokenFilter

This class can be used if the token attributes of a TokenStream are intended to be consumed more than once. It caches all token attribute states locally in a List when the first call to .incrementToken is called. Subsequent calls will used the cache.

CharacterUtils

object CharacterUtils

Utility class to write tokenizers or token filters.

CharArrayMap

open class CharArrayMap<V> : AbstractMutableMap<Any, V>

A simple class that stores key Strings as char[]'s in a hash table. Note that this is not a general purpose class. For example, it cannot remove items from the map, nor does it resize its hash table to be smaller, etc. It is designed to be quick to retrieve items by char[] keys without the necessity of converting to a String first.

CharArraySet

open class CharArraySet : AbstractMutableSet<Any>

A simple class that stores Strings as char[]'s in a hash table. Note that this is not a general purpose class. For example, it cannot remove items from the set, nor does it resize its hash table to be smaller, etc. It is designed to be quick to test if a char[] is in the set without the necessity of converting it to a String first.

CharFilter

abstract class CharFilter(input: Reader) : Reader

Subclasses of CharFilter can be chained to filter a Reader They can be used as [ ] with additional offset correction. Tokenizers will automatically use .correctOffset if a CharFilter subclass is used.

CharFilterFactory

abstract class CharFilterFactory : AbstractAnalysisFactory

Abstract parent class for analysis factories that create CharFilter instances.

DelegatingAnalyzerWrapper

abstract class DelegatingAnalyzerWrapper : AnalyzerWrapper

An analyzer wrapper, that doesn't allow to wrap components or readers. By disallowing it, it means that the thread local resources can be delegated to the delegate analyzer, and not also be allocated on this analyzer. This wrapper class is the base class of all analyzers that just delegate to another analyzer, e.g. per field name.

FilteringTokenFilter

abstract class FilteringTokenFilter(in: TokenStream?) : TokenFilter

Abstract base class for TokenFilters that may remove tokens. You have to implement .accept and return a boolean if the current token should be preserved. .incrementToken uses this method to decide if a token should be passed to the caller.

GraphTokenFilter

abstract class GraphTokenFilter(input: TokenStream) : TokenFilter

An abstract TokenFilter that exposes its input stream as a graph

LowerCaseFilter

open class LowerCaseFilter(in: TokenStream) : TokenFilter

Normalizes token text to lower case.

ReusableStringReader

class ReusableStringReader : Reader

Internal class to enable reuse of the string reader by

StopFilter

open class StopFilter(in: TokenStream, stopWords: CharArraySet) : FilteringTokenFilter

Removes stop words from a token stream.

StopwordAnalyzerBase

abstract class StopwordAnalyzerBase : Analyzer

Base class f2or Analyzers that need to make use of stopword sets.

TokenFilter

abstract class TokenFilter : TokenStream, Unwrappable<TokenStream>

A TokenFilter is a TokenStream whose input is another TokenStream.

TokenFilterFactory

abstract class TokenFilterFactory : AbstractAnalysisFactory

Abstract parent class for analysis factories that create [ ] instances.

Tokenizer

abstract class Tokenizer : TokenStream

A Tokenizer is a TokenStream whose input is a Reader.

TokenizerFactory

abstract class TokenizerFactory : AbstractAnalysisFactory

Abstract parent class for analysis factories that create Tokenizer instances.

TokenStream

abstract class TokenStream : AttributeSource, AutoCloseable

A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.

TokenStreamToAutomaton

open class TokenStreamToAutomaton

Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.

WordlistLoader

object WordlistLoader

Loader for text files that represent a list of stopwords.