Package-level declarations
Types
Link copied to clipboard
Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.
Link copied to clipboard
A grammar-based tokenizer constructed with JFlex.
Link copied to clipboard
Factory for StandardTokenizer.
Link copied to clipboard
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.