Package-level declarations
Types
Abstract parent class for analysis factories TokenizerFactory, TokenFilterFactory and CharFilterFactory.
Helper class for loading named SPIs from classpath (e.g. Tokenizers, TokenStreams).
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.
Extension to Analyzer suitable for Analyzers which wrap other Analyzers.
Converts an Automaton into a TokenStream.
This class can be used if the token attributes of a TokenStream are intended to be consumed more than once. It caches all token attribute states locally in a List when the first call to .incrementToken is called. Subsequent calls will used the cache.
Utility class to write tokenizers or token filters.
A simple class that stores key Strings as char[]'s in a hash table. Note that this is not a general purpose class. For example, it cannot remove items from the map, nor does it resize its hash table to be smaller, etc. It is designed to be quick to retrieve items by char[] keys without the necessity of converting to a String first.
A simple class that stores Strings as char[]'s in a hash table. Note that this is not a general purpose class. For example, it cannot remove items from the set, nor does it resize its hash table to be smaller, etc. It is designed to be quick to test if a char[] is in the set without the necessity of converting it to a String first.
Subclasses of CharFilter can be chained to filter a Reader They can be used as [ ] with additional offset correction. Tokenizers will automatically use .correctOffset if a CharFilter subclass is used.
Abstract parent class for analysis factories that create CharFilter instances.
An analyzer wrapper, that doesn't allow to wrap components or readers. By disallowing it, it means that the thread local resources can be delegated to the delegate analyzer, and not also be allocated on this analyzer. This wrapper class is the base class of all analyzers that just delegate to another analyzer, e.g. per field name.
Abstract base class for TokenFilters that may remove tokens. You have to implement .accept and return a boolean if the current token should be preserved. .incrementToken uses this method to decide if a token should be passed to the caller.
An abstract TokenFilter that exposes its input stream as a graph
Normalizes token text to lower case.
Removes stop words from a token stream.
Base class f2or Analyzers that need to make use of stopword sets.
A TokenFilter is a TokenStream whose input is another TokenStream.
Abstract parent class for analysis factories that create [ ] instances.
A Tokenizer is a TokenStream whose input is a Reader.
Abstract parent class for analysis factories that create Tokenizer instances.
A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.
Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
Loader for text files that represent a list of stopwords.