LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
Constructors
Link copied to clipboard
constructor()
Construct a new LetterTokenizer.
Construct a new LetterTokenizer using a given org.gnit.lucenekmp.util.AttributeFactory.
Construct a new LetterTokenizer using a given AttributeFactory.
Functions
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard