Package-level declarations
Types
Link copied to clipboard
Filters UAX29URLEmailTokenizer with org.gnit.lucenekmp.analysis.LowerCaseFilter and org.gnit.lucenekmp.analysis.StopFilter, using a list of English stop words.
Link copied to clipboard
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
Link copied to clipboard
Factory for UAX29URLEmailTokenizer.
Link copied to clipboard
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.