TokenStream

A TokenStream enumerates the sequence of tokens, either from Fields of a Document or from query text.

This is an abstract class; concrete subclasses are:

  • Tokenizer, a TokenStream whose input is a Reader; and

  • TokenFilter, a TokenStream whose input is another TokenStream * .

TokenStream extends AttributeSource, which provides access to all of the token Attributes for the TokenStream. Note that only one instance per [ ] is created and reused for every token. This approach reduces object creation and allows local caching of references to the AttributeImpls. See .incrementToken for further details.

The workflow of the new TokenStream API is as follows:

  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.

  2. The consumer calls TokenStream.reset.

  3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.

  4. The consumer calls .incrementToken until it returns false consuming the attributes after each call.

  5. The consumer calls .end so that any end-of-stream operations can be performed.

  6. The consumer calls .close to release any resource when finished using the TokenStream.

To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in .incrementToken.

You can find some example code for the new API in the analysis package level Javadoc.

Sometimes it is desirable to capture a current state of a TokenStream, e.g., for buffering purposes (see CachingTokenFilter, TeeSinkTokenFilter). For this usecase and AttributeSource.restoreState can be used.

The TokenStream-API in Lucene is based on the decorator pattern. Therefore all non-abstract subclasses must be final or have at least a final implementation of .incrementToken! This is checked when Java assertions are enabled.

Inheritors

Types

Link copied to clipboard
object Companion

Properties

Functions

Link copied to clipboard
fun <T : Attribute> addAttribute(attClass: KClass<T>): T

The caller must pass in a Class value. This method first checks if an instance of that class is already in this AttributeSource and returns it. Otherwise a new instance is created, added to this AttributeSource and returned.

Link copied to clipboard

Expert: Adds a custom AttributeImpl instance with one or more Attribute interfaces.

Link copied to clipboard

Captures the state of all Attributes. The return value can be passed to .restoreState to restore the state of this or another AttributeSource.

Link copied to clipboard

Resets all Attributes in this AttributeSource by calling AttributeImpl.clear on each Attribute implementation.

Link copied to clipboard

Performs a clone of all AttributeImpl instances returned in a new AttributeSource instance. This method can be used to e.g. create another TokenStream with exactly the same attributes (using .AttributeSource). You can also use it as a (non-performant) replacement for .captureState, if you need to look into / modify the captured state.

Link copied to clipboard
open override fun close()

Releases resources associated with this stream.

Link copied to clipboard
fun copyTo(target: AttributeSource)

Copies the contents of this AttributeSource to the given target AttributeSource. The given instance has to provide all Attributes this instance contains. The actual attribute implementations must be identical in both AttributeSource instances; ideally both AttributeSource instances should use the same [ ]. You can use this method as a replacement for .restoreState, if you use .cloneAttributes instead of .captureState.

Link copied to clipboard
open fun end()

This method is called by the consumer after the last token has been consumed, after .incrementToken returned false (using the new TokenStream API). Streams implementing the old API should upgrade to use this feature.

Link copied to clipboard

Resets all Attributes in this AttributeSource by calling AttributeImpl.end on each Attribute implementation.

Link copied to clipboard
open operator override fun equals(obj: Any?): Boolean
Link copied to clipboard
fun <T : Attribute> getAttribute(attClass: KClass<T>): T?

Returns the instance of the passed in Attribute contained in this AttributeSource

Link copied to clipboard
fun hasAttribute(attClass: KClass<out Attribute>): Boolean

The caller must pass in a Class value. Returns true, iff this AttributeSource contains the passed-in Attribute.

Link copied to clipboard

Returns true, iff this AttributeSource has any attributes

Link copied to clipboard
open override fun hashCode(): Int
Link copied to clipboard
abstract fun incrementToken(): Boolean

Consumers (i.e., IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate [ ]s with the attributes of the next token.

Link copied to clipboard
fun reflectAsString(prependAttClass: Boolean): String

This method returns the current attribute values as a string in the following format by calling the .reflectWith method:

Link copied to clipboard

This method is for introspection of attributes, it should simply add the key/values this AttributeSource holds to the given AttributeReflector.

Link copied to clipboard

Removes all attributes and their implementations from this AttributeSource.

Link copied to clipboard
open fun reset()

This method is called by a consumer before it begins consumption using .incrementToken.

Link copied to clipboard

Restores this state by copying the values of all attribute implementations that this state contains into the attributes implementations of the targetStream. The targetStream must contain a corresponding instance for each argument contained in this state (e.g. it is not possible to restore the state of an AttributeSource containing a TermAttribute into a AttributeSource using a Token instance as implementation).

Link copied to clipboard
open override fun toString(): String

Returns a string consisting of the class's simple name, the hex representation of the identity hash code, and the current reflection of all attributes.