UAX29URLEmailAnalyzer
Filters UAX29URLEmailTokenizer with org.gnit.lucenekmp.analysis.LowerCaseFilter and org.gnit.lucenekmp.analysis.StopFilter, using a list of English stop words.
Since
3.6.0
Constructors
Link copied to clipboard
Builds an analyzer with the given stop words.
constructor()
Builds an analyzer with the default stop words (STOP_WORDS_SET).
Builds an analyzer with the stop words from the given reader.
Functions
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then use LengthFilter to remove long tokens. The default is UAX29URLEmailAnalyzer.DEFAULT_MAX_TOKEN_LENGTH.
Link copied to clipboard