Package-level declarations
Types
Folds all Unicode digits in :General_Category=Decimal_Number: to Basic Latin digits (0-9).
Factory for DecimalDigitFilter.
Converts an incoming graph token stream, such as one from SynonymGraphFilter, into a flat form so that all nodes form a single linear chain with no side paths.
Factory for FlattenGraphFilter.
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
Emits the entire input as a single token.
Factory for KeywordTokenizer.
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by Character.isLetter() predicate.
Factory for LetterTokenizer.
Normalizes token text to lower case.
Factory for LowerCaseFilter.
An Analyzer that filters LetterTokenizer with LowerCaseFilter
Filters LetterTokenizer with CoreLowerCaseFilter and CoreStopFilter.
Removes stop words from a token stream.
Factory for StopFilter.
Removes tokens whose types appear in a set of blocked types from a token stream.
Factory class for TypeTokenFilter.
An Analyzer that uses UnicodeWhitespaceTokenizer.
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens (according to Unicode's WHITESPACE property).
Normalizes token text to UPPER CASE.
Factory for UpperCaseFilter.
An Analyzer that uses WhitespaceTokenizer.
Factory for WhitespaceTokenizer.