Package-level declarations

Types

Link copied to clipboard

Base class for testing tokenstream factories.

Link copied to clipboard

Base class for all Lucene unit tests that use TokenStreams.

Link copied to clipboard

TokenStream from a canned list of Tokens.

Link copied to clipboard
class CrankyTokenFilter(input: TokenStream, val random: Random) : TokenFilter

Throws IOException from random Tokenstream methods.

Link copied to clipboard

Simplified Kotlin port of Lucene's LookaheadTokenFilter.

Link copied to clipboard
class MockAnalyzer(random: Random, runAutomaton: CharacterRunAutomaton, lowerCase: Boolean, filter: CharacterRunAutomaton) : Analyzer

Analyzer for testing purposes.

Link copied to clipboard

Analyzer for testing that encodes terms as UTF-16 bytes.

Link copied to clipboard
class MockCharFilter(in: Reader, val remainder: Int = 0) : CharFilter

the purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use correctOffset or does incorrect offset math.

Link copied to clipboard

TokenFilter that adds random fixed-length payloads.

Link copied to clipboard

Randomly inserts overlapping tokens with variable position length.

Link copied to clipboard

Randomly injects holes (similar to what a stopfilter would do)

Link copied to clipboard

Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.

Link copied to clipboard
class MockReaderWrapper(random: Random, in: Reader) : Reader

Wraps a Reader, and can throw random or fixed exceptions, and spoon feed read chars.

Link copied to clipboard

adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".

Link copied to clipboard

adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".

Link copied to clipboard

A token filter for testing that removes terms accepted by a DFA.

Link copied to clipboard

Tokenizer for testing.

Link copied to clipboard

Extension of CharTermAttributeImpl that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.

Link copied to clipboard

TokenFilter that adds random variable-length payloads.

Link copied to clipboard

A Token is an occurrence of a term from the text of a field. It consists of the term's text, start and end offsets, and optionally flags and payload.

Link copied to clipboard
class TokenStreamToDot(inputText: String, in: TokenStream, out: PrintWriter)

Consumes a TokenStream and outputs the dot (graphviz) string (graph).

Link copied to clipboard

Simple example of a filter that exercises LookaheadTokenFilter.

Link copied to clipboard

A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).