TermVectorsWriter

Codec API for writing term vectors:

  1. For every document, .startDocument is called, informing the Codec how many fields will be written.

  2. .startField is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled.

  3. Within each field, .startTerm is called for each term.

  4. If offsets and/or positions are enabled, then .addPosition will be called for each term occurrence.

  5. After all documents have been written, .finish is called for verification/sanity-checks.

  6. Finally the writer is closed (.close)

Inheritors

Properties

Link copied to clipboard

Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).

Functions

Link copied to clipboard
abstract fun addPosition(position: Int, startOffset: Int, endOffset: Int, payload: BytesRef?)

Adds a term position and offsets

Link copied to clipboard
open fun addProx(numProx: Int, positions: DataInput?, offsets: DataInput?)

Called by IndexWriter when writing new segments.

Link copied to clipboard
abstract override fun close()
Link copied to clipboard
abstract fun finish(numDocs: Int)

Called before .close, passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to .startDocument, but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.

Link copied to clipboard
open fun finishDocument()

Called after a doc and all its fields have been added.

Link copied to clipboard
open fun finishField()

Called after a field and all its terms have been added.

Link copied to clipboard
open fun finishTerm()

Called after a term and all its positions have been added.

Link copied to clipboard
open fun merge(mergeState: MergeState): Int

Merges in the term vectors from the readers in mergeState. The default implementation skips over deleted documents, and uses .startDocument, .startField, .startTerm, .addPosition, and .finish, returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

Link copied to clipboard
abstract fun ramBytesUsed(): Long

Return the memory usage of this object in bytes. Negative values are illegal.

Link copied to clipboard
abstract fun startDocument(numVectorFields: Int)

Called before writing the term vectors of the document. .startField will be called numVectorFields times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this case numVectorFields will be zero.

Link copied to clipboard
abstract fun startField(info: FieldInfo?, numTerms: Int, positions: Boolean, offsets: Boolean, payloads: Boolean)

Called before writing the terms of the field. .startTerm will be called numTerms times.

Link copied to clipboard
abstract fun startTerm(term: BytesRef?, freq: Int)

Adds a term and its term frequency freq. If this field has positions and/or offsets enabled, then .addPosition will be called freq * times respectively.