UnicodeWhitespaceTokenizer
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens (according to Unicode's WHITESPACE property).
Constructors
Link copied to clipboard
constructor()
Construct a new UnicodeWhitespaceTokenizer.
Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.
Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.
Functions
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard