UnicodeWhitespaceTokenizer

constructor()

Construct a new UnicodeWhitespaceTokenizer.


constructor(factory: AttributeFactory)

Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.

Parameters

factory

the attribute factory to use for this Tokenizer


constructor(factory: AttributeFactory, maxTokenLen: Int)

Construct a new UnicodeWhitespaceTokenizer using a given AttributeFactory.

Parameters

factory

the attribute factory to use for this Tokenizer

maxTokenLen

maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)

Throws

if maxTokenLen is invalid.