core/org.gnit.lucenekmp.index/IndexWriter

IndexWriter

open class IndexWriter(d: Directory, conf: IndexWriterConfig) : AutoCloseable, TwoPhaseCommit, Accountable, MergePolicy.MergeContext

An IndexWriter creates and maintains an index.

The {@link OpenMode} option on {@link IndexWriterConfig#setOpenMode(OpenMode)} determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with {@link OpenMode#CREATE} even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used IndexWriter will create a new index if there is not already an index at the provided path and otherwise open the existing index.

In either case, documents are added with {@link #addDocument(Iterable) addDocument} and removed with {@link #deleteDocuments(Term...)} or {@link #deleteDocuments(Query...)}. A document can be updated with {@link #updateDocument(Term, Iterable) updateDocument} (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, {@link #close() close} should be called.

Each method that changes the index returns a {@code long} sequence number, which expresses the effective order in which each change was applied. {@link #commit} also returns a sequence number, describing which changes are in the commit point and which are not. Sequence numbers are transient (not saved into the index in any way) and only valid within a single {@code IndexWriter} instance.

These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method calls). A flush is triggered when there are enough added documents since the last flush. Flushing is triggered either by RAM usage of the documents (see {@link IndexWriterConfig#setRAMBufferSizeMB}) or the number of added documents (see {@link IndexWriterConfig#setMaxBufferedDocs(int)}). The default is to flush when RAM usage hits {@link IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. In contrast to the other flush options {@link IndexWriterConfig#setRAMBufferSizeMB} and {@link IndexWriterConfig#setMaxBufferedDocs(int)}, deleted terms won't trigger a segment flush. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #commit()} or {@link #close} is called. A flush may also trigger one or more segment merges, which by default run within a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}).

Opening an IndexWriter creates a lock file for the directory in use. Trying to open another IndexWriter on the same directory will lead to a {@link LockObtainFailedException}.

Expert: IndexWriter allows an optional {@link IndexDeletionPolicy} implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy} which removes all prior commits as soon as a new commit is done. Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, either because this is useful for your application, or to give readers enough time to refresh to the new commit without having the old commit deleted out from under them. The latter is necessary when multiple computers take turns opening their own {@code IndexWriter} and {@code IndexReader}s against a single shared index mounted via remote filesystems like NFS which do not support "delete on last close" semantics. A single computer accessing an index via NFS is fine with the default deletion policy since NFS clients emulate "delete on last close" locally. That said, accessing an index via NFS will likely result in poor performance compared to a local IO device.

Expert: IndexWriter allows you to separately change the {@link MergePolicy} and the {@link MergeScheduler}. The {@link MergePolicy} is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a {@link MergePolicy.MergeSpecification} describing the merges. The default is {@link LogByteSizeMergePolicy}. Then, the {@link MergeScheduler} is invoked with the requested merges and it decides when and how to run the merges. The default is {@link ConcurrentMergeScheduler}.

NOTE: if you hit an Error, or disaster strikes during a checkpoint then IndexWriter will close itself. This is a defensive measure in case any internal state (buffered documents, deletions, reference counts) were corrupted. Any subsequent calls will throw an AlreadyClosedException.

NOTE: {@link IndexWriter} instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead.

NOTE: If you call Thread.interrupt() on a thread that's within IndexWriter, IndexWriter will try to catch this (eg, if it's in a wait() or Thread.sleep()), and will then throw the unchecked exception {@link ThreadInterruptedException} and clear the interrupt status on the thread.

Constructors

IndexWriter

constructor(d: Directory, conf: IndexWriterConfig)

Types

Companion

object Companion

DocStats

class DocStats(val maxDoc: Int, val numDocs: Int)

DocStats for this index

IndexReaderWarmer

fun interface IndexReaderWarmer

If DirectoryReader.open has been called (ie, this writer is in near real-time mode), then after a merge completes, this class can be invoked to warm the reader on the newly merged segment, before the merge commits. This is not required for near real-time search, but will reduce search latency on opening a new near real-time reader after a merge completes.

Properties

childResources

open val childResources: MutableCollection<Accountable>

Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).

config

var config: LiveIndexWriterConfig

docWriter

val docWriter: DocumentsWriter

globalFieldNumberMap

val globalFieldNumberMap: FieldInfos.FieldNumbers

infoStream

open lateinit override var infoStream: InfoStream

If enabled, information about merges will be printed to this.

mergingSegments

open override val mergingSegments: HashSet<SegmentCommitInfo>

Returns an unmodifiable set of segments that are currently merging.

Functions

addDocument

fun addDocument(doc: Iterable<out IndexableField>): Long

Adds a document to this index.

addDocuments

fun addDocuments(docs: Iterable<Iterable<IndexableField>>): Long

Atomically adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

addIndexes

fun addIndexes(vararg readers: CodecReader): Long

Merges the provided indexes into this index.

fun addIndexes(vararg dirs: Directory): Long

Adds all segments from an array of indexes into this index.

addIndexesReaderMerge

fun addIndexesReaderMerge(merge: MergePolicy.OneMerge)

Runs a single merge operation for IndexWriter.addIndexes.

advanceSegmentInfosVersion

fun advanceSegmentInfosVersion(newVersion: Long)

If SegmentInfos.getVersion is below newVersion then update it to this value.

cloneSegmentInfos

fun cloneSegmentInfos(): SegmentInfos

Tests should use this method to snapshot the current segmentInfos to have a consistent view

open override fun close()

Closes all open resources and releases the write lock.

commit

open override fun commit(): Long

Commits all pending changes (added and deleted documents, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. Note that this does not wait for any running background merges to finish. This may be a costly operation, so you should test the cost in your application and do it only when really necessary.

decRefDeleter

fun decRefDeleter(segmentInfos: SegmentInfos)

Record that the files referenced by this SegmentInfos are no longer in use. Only call this if you are sure you previously called .incRefDeleter.

deleteAll

fun deleteAll(): Long

Delete all documents in the index.

deleteDocuments

fun deleteDocuments(vararg terms: Term): Long

Deletes the document(s) containing any of the terms. All given deletes are applied and flushed atomically at the same time.

fun deleteDocuments(vararg queries: Query): Long

Deletes the document(s) matching any of the provided queries. All given deletes are applied and flushed atomically at the same time.

deleteUnusedFiles

fun deleteUnusedFiles()

Expert: remove any index files that are no longer used.

ensureOpen

fun ensureOpen(failIfClosing: Boolean = true)

Used internally to throw an AlreadyClosedException if this IndexWriter has been closed (closed=true) or is in the process of closing (closing=true).

executeMerge

fun executeMerge(trigger: MergeTrigger)

flush

fun flush()

Moves all in-memory segments to the Directory, but does not commit (fsync) them (call .commit for that).

fun flush(triggerMerge: Boolean, applyAllDeletes: Boolean)

Flush all in-memory buffered updates (adds and deletes) to the Directory.

flushNextBuffer

fun flushNextBuffer(): Boolean

Expert: Flushes the next pending writer per thread buffer if available or the largest active non-pending writer per thread buffer in the calling thread. This can be used to flush documents to disk outside of an indexing thread. In contrast to .flush this won't mark all currently active indexing buffers as flush-pending.

forceApply

fun forceApply(updates: FrozenBufferedUpdates)

forceMerge

@JvmOverloads

fun forceMerge(maxNumSegments: Int, doWait: Boolean = true)

Forces merge policy to merge segments until there are <= maxNumSegments. The actual merges to be executed are determined by the MergePolicy.

forceMergeDeletes

@JvmOverloads

fun forceMergeDeletes(doWait: Boolean = true)

Forces merging of all segments that have deleted documents. The actual merges to be executed are determined by the MergePolicy. For example, the default TieredMergePolicy will only pick a segment if the percentage of deleted docs is over 10%.

getAnalyzer

fun getAnalyzer(): Analyzer

Returns the analyzer used by this index.

getBufferedDeleteTermsSize

fun getBufferedDeleteTermsSize(): Int

getDirectory

fun getDirectory(): Directory

Returns the Directory used by this index.

getDocStats

fun getDocStats(): IndexWriter.DocStats

Returns accurate DocStats for this writer. The numDoc for instance can change after maxDoc is fetched that causes numDocs to be greater than maxDoc which makes it hard to get accurate document stats from IndexWriter.

getDocsWriter

fun getDocsWriter(): DocumentsWriter

getFieldNames

fun getFieldNames(): MutableSet<String>

Return an unmodifiable set of all field names as visible from this IndexWriter, across all segments of the index.

getFlushCount

fun getFlushCount(): Int

getFlushDeletesCount

fun getFlushDeletesCount(): Int

getFlushingBytes

fun getFlushingBytes(): Long

Returns the number of bytes currently being flushed

getLiveCommitData

fun getLiveCommitData(): Iterable<MutableMap.MutableEntry<String, String>>?

Returns the commit user data iterable previously set with .setLiveCommitData, or null if nothing has been set yet.

getMaxCompletedSequenceNumber

fun getMaxCompletedSequenceNumber(): Long

Returns the highest #sequence_number across all completed operations, or 0 if no operations have finished yet. Still in-flight operations (in other threads) are not counted until they finish.

getNumBufferedDocuments

fun getNumBufferedDocuments(): Int

getPendingNumDocs

fun getPendingNumDocs(): Long

Returns the number of documents in the index including documents are being added (i.e., reserved).

getPooledInstance

fun getPooledInstance(info: SegmentCommitInfo, create: Boolean): ReadersAndUpdates?

getReader

fun getReader(applyAllDeletes: Boolean, writeAllDeletes: Boolean): DirectoryReader

Expert: returns a readonly reader, covering all committed as well as un-committed changes to the index. This provides "near real-time" searching, in that changes made during an IndexWriter session can be quickly made available for searching without closing the writer nor calling .commit.

getSegmentCount

fun getSegmentCount(): Int

getTragicException

fun getTragicException(): Throwable?

If this IndexWriter was closed as a side-effect of a tragic exception, e.g. disk full while flushing a new segment, this returns the root cause exception. Otherwise (no tragic exception has occurred) it returns null.

hasChangesInRam

fun hasChangesInRam(): Boolean

Returns true if there are any changes or deletes that are not flushed or applied.

hasDeletions

fun hasDeletions(): Boolean

Returns true if this index has deletions (including buffered deletions). Note that this will return true if there are buffered Term/Query deletions, even if it turns out those buffered deletions don't match any documents.

hasPendingMerges

fun hasPendingMerges(): Boolean

Expert: returns true if there are merges waiting to be scheduled.

hasUncommittedChanges

fun hasUncommittedChanges(): Boolean

Returns true if there may be changes that have not been committed. There are cases where this may return true when there are no actual "real" changes to the index, for example if you've deleted by Term or Query but that Term or Query does not match any documents. Also, if a merge kicked off as a result of flushing a new segment during .commit, or a concurrent merged finished, this method may return true right after you had just called .commit.

incRefDeleter

fun incRefDeleter(segmentInfos: SegmentInfos)

Record that the files referenced by this SegmentInfos are still in use.

isClosed

fun isClosed(): Boolean

isDeleterClosed

fun isDeleterClosed(): Boolean

isFullyDeleted

fun isFullyDeleted(readersAndUpdates: ReadersAndUpdates): Boolean

isOpen

fun isOpen(): Boolean

Returns true if this IndexWriter is still open.

maxDoc

fun maxDoc(i: Int): Int

maybeMerge

fun maybeMerge()

Expert: asks the mergePolicy whether any merges are necessary now and if so, runs the requested merges and then iterate (test again if merges are needed) until no more merges are returned by the mergePolicy.

mergeInit

fun mergeInit(merge: MergePolicy.OneMerge)

Does initial setup for a merge, which is fast but holds the synchronized lock on IndexWriter instance.

newestSegment

fun newestSegment(): SegmentCommitInfo?

nrtIsCurrent

fun nrtIsCurrent(infos: SegmentInfos): Boolean

numDeletedDocs

open override fun numDeletedDocs(info: SegmentCommitInfo): Int

Obtain the number of deleted docs for a pooled reader. If the reader isn't being pooled, the segmentInfo's delCount is returned.

numDeletesToMerge

open override fun numDeletesToMerge(info: SegmentCommitInfo): Int

Returns the number of deletes a merge would claim back if the given segment is merged.

numRamDocs

fun numRamDocs(): Int

Expert: Return the number of documents currently buffered in RAM.

onTragicEvent

fun onTragicEvent(tragedy: Throwable, location: String)

This method should be called on a tragic event ie. if a downstream class of the writer hits an unrecoverable exception. This method does not rethrow the tragic event exception.

prepareCommit

open override fun prepareCommit(): Long

Expert: prepare for commit. This does the first phase of 2-phase commit. This method does all steps necessary to commit changes since this writer was opened: flushes pending added and deleted docs, syncs the index files, writes most of next segments_N file. After calling this you must call either .commit to finish the commit, or .rollback to revert the commit and undo all changes done since the writer was opened.

ramBytesUsed

open override fun ramBytesUsed(): Long

Return the memory usage of this object in bytes. Negative values are illegal.

release

fun release(readersAndUpdates: ReadersAndUpdates)

rollback

open override fun rollback()

Close the IndexWriter without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called). This removes any temporary files that had been created, after which the state of the index will be the same as it was when commit() was last called or when this writer was first opened. This also clears a previous call to .prepareCommit.

segString

fun segString(): String

Returns a string description of all segments, for debugging.

fun segString(infos: Iterable<SegmentCommitInfo>): String

setLiveCommitData

fun setLiveCommitData(commitUserData: Iterable<MutableMap.MutableEntry<String, String>>)

Sets the iterator to provide the commit user data map at commit time. Calling this method is considered a committable change and will be .commit even if there are no other changes this writer. Note that you must call this method before .prepareCommit. Otherwise it won't be included in the follow-on .commit.

fun setLiveCommitData(commitUserData: Iterable<MutableMap.MutableEntry<String, String>>, doIncrementVersion: Boolean)

Sets the commit user data iterator, controlling whether to advance the SegmentInfos.getVersion.

softUpdateDocument

fun softUpdateDocument(term: Term?, doc: Iterable<out IndexableField>, vararg softDeletes: Field): Long

Expert: Updates a document by first updating the document(s) containing term with the given doc-values fields and then adding the new document. The doc-values update and the subsequent addition are atomic, as seen by a reader on the same index (a flush may happen only after the addition).

softUpdateDocuments

fun softUpdateDocuments(term: Term?, docs: Iterable<Iterable<IndexableField>>, vararg softDeletes: Field): Long

Expert: Atomically updates documents matching the provided term with the given doc-values fields and adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

toLiveInfos

fun toLiveInfos(sis: SegmentInfos): SegmentInfos

tryApply

fun tryApply(updates: FrozenBufferedUpdates): Boolean

Translates a frozen packet of delete term/query, or doc values updates, into their actual docIDs in the index, and applies the change. This is a heavy operation and is done concurrently by incoming indexing threads. This method will return immediately without blocking if another thread is currently applying the package. In order to ensure the packet has been applied, IndexWriter.forceApply must be called.

tryDeleteDocument

fun tryDeleteDocument(readerIn: IndexReader, docID: Int): Long

Expert: attempts to delete by document ID, as long as the provided reader is a near-real-time reader (from DirectoryReader.open). If the provided reader is an NRT reader obtained from this writer, and its segment has not been merged away, then the delete succeeds and this method returns a valid (> 0) sequence number; else, it returns -1 and the caller must then separately delete by Term or Query.

tryUpdateDocValue

fun tryUpdateDocValue(readerIn: IndexReader, docID: Int, vararg fields: Field): Long

Expert: attempts to update doc values by document ID, as long as the provided reader is a near-real-time reader (from DirectoryReader.open). If the provided reader is an NRT reader obtained from this writer, and its segment has not been merged away, then the update succeeds and this method returns a valid (> 0) sequence number; else, it returns -1 and the caller must then either retry the update and resolve the document again. If a doc values fields data is null the existing value is removed from all documents matching the term. This can be used to un-delete a soft-deleted document since this method will apply the field update even if the document is marked as deleted.

updateBinaryDocValue

fun updateBinaryDocValue(term: Term, field: String, value: BytesRef): Long

Updates a document's BinaryDocValues for field to the given value * . You can only update fields that already exist in the index, not add new fields through this method. You can only update fields that were indexed only with doc values.

updateDocument

open fun updateDocument(term: Term?, doc: Iterable<IndexableField>): Long

Updates a document by first deleting the document(s) containing term and then adding the new document. The delete and then add are atomic as seen by a reader on the same index (flush may happen only after the add).

updateDocuments

fun updateDocuments(delTerm: Term, docs: Iterable<Iterable<IndexableField>>): Long

Atomically deletes documents matching the provided delTerm and adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

fun updateDocuments(delQuery: Query, docs: Iterable<Iterable<IndexableField>>): Long

Similar to .updateDocuments, but take a query instead of a term to identify the documents to be updated

updateDocValues

fun updateDocValues(term: Term, vararg updates: Field): Long

Updates documents' DocValues fields to the given values. Each field update is applied to the set of documents that are associated with the Term to the same value. All updates are atomically applied and flushed together. If a doc values fields data is null the existing value is removed from all documents matching the term.

updateNumericDocValue

fun updateNumericDocValue(term: Term, field: String, value: Long): Long

Updates a document's NumericDocValues for field to the given value * . You can only update fields that already exist in the index, not add new fields through this method. You can only update fields that were indexed with doc values only.

waitForMerges

fun waitForMerges()

Wait for any currently outstanding merges to finish.

writeSomeDocValuesUpdates

fun writeSomeDocValuesUpdates()