Package-level declarations

Types

Link copied to clipboard

Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

Link copied to clipboard

A grammar-based tokenizer constructed with JFlex.

Link copied to clipboard

This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.