Package-level declarations
Types
A CharacterIterator used internally for use with BreakIterator
An abstract base class for simple, character-oriented tokenizers.
Removes elisions from a TokenStream. For example, "l'avion" (the plane) will be tokenized as "avion" (plane).
Factory for ElisionFilter.
Simple ResourceLoader that opens resource files from the local file system, optionally resolving against a base directory.
A StringBuilder that allows one to access the array.
Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet. This is like a PushbackReader, except you don't have to specify up-front the max size of the buffer, but you do have to periodically call freeBefore.
Breaks text into sentences with a BreakIterator and allows subclasses to decompose these sentences into words.
Some commonly-used stemming functions.