Package-level declarations

Types

Link copied to clipboard

A CharacterIterator used internally for use with BreakIterator

Link copied to clipboard
abstract class CharTokenizer : Tokenizer

An abstract base class for simple, character-oriented tokenizers.

Link copied to clipboard
class CSVUtil

Utility class for parsing CSV text.

Link copied to clipboard

Removes elisions from a TokenStream. For example, "l'avion" (the plane) will be tokenized as "avion" (plane).

Link copied to clipboard
class FilesystemResourceLoader(baseDirectory: Path, delegate: ResourceLoader) : ResourceLoader

Simple ResourceLoader that opens resource files from the local file system, optionally resolving against a base directory.

Link copied to clipboard

A StringBuilder that allows one to access the array.

Link copied to clipboard

Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet. This is like a PushbackReader, except you don't have to specify up-front the max size of the buffer, but you do have to periodically call freeBefore.

Link copied to clipboard

Breaks text into sentences with a BreakIterator and allows subclasses to decompose these sentences into words.

Link copied to clipboard

Some commonly-used stemming functions.