TokenStream
A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.
This is an abstract class; concrete subclasses are:
Tokenizer, a
TokenStreamwhose input is a Reader; andTokenFilter, a
TokenStreamwhose input is anotherTokenStream* .
TokenStream extends AttributeSource, which provides access to all of the token Attributes for the TokenStream. Note that only one instance per [ ] is created and reused for every token. This approach reduces object creation and allows local caching of references to the AttributeImpls. See .incrementToken for further details.
The workflow of the new TokenStream API is as follows:
Instantiation of
TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.The consumer calls TokenStream.reset.
The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
The consumer calls .incrementToken until it returns false consuming the attributes after each call.
The consumer calls .end so that any end-of-stream operations can be performed.
The consumer calls .close to release any resource when finished using the
TokenStream.
To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in .incrementToken.
You can find some example code for the new API in the analysis package level Javadoc.
Sometimes it is desirable to capture a current state of a TokenStream, e.g., for buffering purposes (see CachingTokenFilter, TeeSinkTokenFilter). For this usecase and AttributeSource.restoreState can be used.
The TokenStream-API in Lucene is based on the decorator pattern. Therefore all non-abstract subclasses must be final or have at least a final implementation of .incrementToken! This is checked when Java assertions are enabled.
Inheritors
Functions
Expert: Adds a custom AttributeImpl instance with one or more Attribute interfaces.
Captures the state of all Attributes. The return value can be passed to .restoreState to restore the state of this or another AttributeSource.
Resets all Attributes in this AttributeSource by calling AttributeImpl.clear on each Attribute implementation.
Performs a clone of all AttributeImpl instances returned in a new AttributeSource instance. This method can be used to e.g. create another TokenStream with exactly the same attributes (using .AttributeSource). You can also use it as a (non-performant) replacement for .captureState, if you need to look into / modify the captured state.
Copies the contents of this AttributeSource to the given target AttributeSource. The given instance has to provide all Attributes this instance contains. The actual attribute implementations must be identical in both AttributeSource instances; ideally both AttributeSource instances should use the same [ ]. You can use this method as a replacement for .restoreState, if you use .cloneAttributes instead of .captureState.
Resets all Attributes in this AttributeSource by calling AttributeImpl.end on each Attribute implementation.
The caller must pass in a Class value. Returns true, iff this AttributeSource contains the passed-in Attribute.
Returns true, iff this AttributeSource has any attributes
Consumers (i.e., IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate [ ]s with the attributes of the next token.
This method returns the current attribute values as a string in the following format by calling the .reflectWith method:
This method is for introspection of attributes, it should simply add the key/values this AttributeSource holds to the given AttributeReflector.
Removes all attributes and their implementations from this AttributeSource.
Restores this state by copying the values of all attribute implementations that this state contains into the attributes implementations of the targetStream. The targetStream must contain a corresponding instance for each argument contained in this state (e.g. it is not possible to restore the state of an AttributeSource containing a TermAttribute into a AttributeSource using a Token instance as implementation).