CustomAnalyzer

A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

You can create an instance of this Analyzer using the builder by passing the SPI names (as defined by the Java `ServiceLoader` interface) to it:

Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer(StandardTokenizerFactory.NAME)
.addTokenFilter(LowerCaseFilterFactory.NAME)
.addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();

The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

This is the same as the above:

Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
.withTokenizer("standard")
.addTokenFilter("lowercase")
.addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
.build();

The list of names to be used for components can be looked up through: [TokenizerFactory.availableTokenizers], [TokenFilterFactory.availableTokenFilters], and [CharFilterFactory.availableCharFilters].

You can create conditional branches in the analyzer by using [Builder.when] and [Builder.whenTerm]:

Analyzer ana = CustomAnalyzer.builder()
    .withTokenizer("standard")
    .addTokenFilter("lowercase")
    .whenTerm(t -> t.length() > 10)
      .addTokenFilter("reversestring")
    .endwhen()
    .build();

Since

5.0.0

Types

Link copied to clipboard
class Builder

Builder for CustomAnalyzer.

Link copied to clipboard
object Companion
Link copied to clipboard

Factory class for a ConditionalTokenFilter

Properties

Link copied to clipboard

Returns the list of char filters that are used in this analyzer.

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Returns the list of token filters that are used in this analyzer.

Link copied to clipboard

Returns the tokenizer that is used in this analyzer.

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard
open override fun getOffsetGap(fieldName: String?): Int
Link copied to clipboard
open override fun getPositionIncrementGap(fieldName: String?): Int
Link copied to clipboard
fun normalize(fieldName: String, text: String): BytesRef
Link copied to clipboard
fun tokenStream(fieldName: String, text: String): TokenStream
fun tokenStream(fieldName: String, reader: Reader): TokenStream
Link copied to clipboard
open override fun toString(): String