StandardAnalyzer
Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.
Since
3.1
Constructors
Properties
Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then use LengthFilter to remove long tokens. The default is StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH.
Returns the used ReuseStrategy.
An immutable stopword set
Functions
Just like .getPositionIncrementGap, except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.
Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is
Returns a TokenStream suitable for fieldName, tokenizing the contents of text.
Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.