core/org.gnit.lucenekmp.codecs/TermVectorsWriter

TermVectorsWriter

abstract class TermVectorsWriter : AutoCloseable, Accountable

Codec API for writing term vectors:

For every document, .startDocument is called, informing the Codec how many fields will be written.
.startField is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled.
Within each field, .startTerm is called for each term.
If offsets and/or positions are enabled, then .addPosition will be called for each term occurrence.
After all documents have been written, .finish is called for verification/sanity-checks.
Finally the writer is closed (.close)

Inheritors

Lucene90CompressingTermVectorsWriter

Properties

childResources

open val childResources: MutableCollection<Accountable>

Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).

Functions

addPosition

abstract fun addPosition(position: Int, startOffset: Int, endOffset: Int, payload: BytesRef?)

Adds a term position and offsets

addProx

open fun addProx(numProx: Int, positions: DataInput?, offsets: DataInput?)

Called by IndexWriter when writing new segments.

abstract override fun close()

finish

abstract fun finish(numDocs: Int)

Called before .close, passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to .startDocument, but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.

finishDocument

open fun finishDocument()

Called after a doc and all its fields have been added.

finishField

open fun finishField()

Called after a field and all its terms have been added.

finishTerm

open fun finishTerm()

Called after a term and all its positions have been added.

merge

open fun merge(mergeState: MergeState): Int

Merges in the term vectors from the readers in mergeState. The default implementation skips over deleted documents, and uses .startDocument, .startField, .startTerm, .addPosition, and .finish, returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

ramBytesUsed

abstract fun ramBytesUsed(): Long

Return the memory usage of this object in bytes. Negative values are illegal.

startDocument

abstract fun startDocument(numVectorFields: Int)

Called before writing the term vectors of the document. .startField will be called numVectorFields times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this case numVectorFields will be zero.

startField

abstract fun startField(info: FieldInfo?, numTerms: Int, positions: Boolean, offsets: Boolean, payloads: Boolean)

Called before writing the terms of the field. .startTerm will be called numTerms times.

startTerm

abstract fun startTerm(term: BytesRef?, freq: Int)

Adds a term and its term frequency freq. If this field has positions and/or offsets enabled, then .addPosition will be called freq * times respectively.