IndexWriter

An IndexWriter creates and maintains an index.

The {@link OpenMode} option on {@link IndexWriterConfig#setOpenMode(OpenMode)} determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with {@link OpenMode#CREATE} even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. If {@link OpenMode#CREATE_OR_APPEND} is used IndexWriter will create a new index if there is not already an index at the provided path and otherwise open the existing index.

In either case, documents are added with {@link #addDocument(Iterable) addDocument} and removed with {@link #deleteDocuments(Term...)} or {@link #deleteDocuments(Query...)}. A document can be updated with {@link #updateDocument(Term, Iterable) updateDocument} (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, {@link #close() close} should be called.

Each method that changes the index returns a {@code long} sequence number, which expresses the effective order in which each change was applied. {@link #commit} also returns a sequence number, describing which changes are in the commit point and which are not. Sequence numbers are transient (not saved into the index in any way) and only valid within a single {@code IndexWriter} instance.

These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method calls). A flush is triggered when there are enough added documents since the last flush. Flushing is triggered either by RAM usage of the documents (see {@link IndexWriterConfig#setRAMBufferSizeMB}) or the number of added documents (see {@link IndexWriterConfig#setMaxBufferedDocs(int)}). The default is to flush when RAM usage hits {@link IndexWriterConfig#DEFAULT_RAM_BUFFER_SIZE_MB} MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. In contrast to the other flush options {@link IndexWriterConfig#setRAMBufferSizeMB} and {@link IndexWriterConfig#setMaxBufferedDocs(int)}, deleted terms won't trigger a segment flush. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #commit()} or {@link #close} is called. A flush may also trigger one or more segment merges, which by default run within a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}).

Opening an IndexWriter creates a lock file for the directory in use. Trying to open another IndexWriter on the same directory will lead to a {@link LockObtainFailedException}.

Expert: IndexWriter allows an optional {@link IndexDeletionPolicy} implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy} which removes all prior commits as soon as a new commit is done. Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, either because this is useful for your application, or to give readers enough time to refresh to the new commit without having the old commit deleted out from under them. The latter is necessary when multiple computers take turns opening their own {@code IndexWriter} and {@code IndexReader}s against a single shared index mounted via remote filesystems like NFS which do not support "delete on last close" semantics. A single computer accessing an index via NFS is fine with the default deletion policy since NFS clients emulate "delete on last close" locally. That said, accessing an index via NFS will likely result in poor performance compared to a local IO device.

Expert: IndexWriter allows you to separately change the {@link MergePolicy} and the {@link MergeScheduler}. The {@link MergePolicy} is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a {@link MergePolicy.MergeSpecification} describing the merges. The default is {@link LogByteSizeMergePolicy}. Then, the {@link MergeScheduler} is invoked with the requested merges and it decides when and how to run the merges. The default is {@link ConcurrentMergeScheduler}.

NOTE: if you hit an Error, or disaster strikes during a checkpoint then IndexWriter will close itself. This is a defensive measure in case any internal state (buffered documents, deletions, reference counts) were corrupted. Any subsequent calls will throw an AlreadyClosedException.

NOTE: {@link IndexWriter} instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead.

NOTE: If you call Thread.interrupt() on a thread that's within IndexWriter, IndexWriter will try to catch this (eg, if it's in a wait() or Thread.sleep()), and will then throw the unchecked exception {@link ThreadInterruptedException} and clear the interrupt status on the thread.

Constructors

Link copied to clipboard
constructor(d: Directory, conf: IndexWriterConfig)

Types

Link copied to clipboard
object Companion
Link copied to clipboard
class DocStats(val maxDoc: Int, val numDocs: Int)

DocStats for this index

Link copied to clipboard
fun interface IndexReaderWarmer

If DirectoryReader.open has been called (ie, this writer is in near real-time mode), then after a merge completes, this class can be invoked to warm the reader on the newly merged segment, before the merge commits. This is not required for near real-time search, but will reduce search latency on opening a new near real-time reader after a merge completes.

Properties

Link copied to clipboard

Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open lateinit override var infoStream: InfoStream

If enabled, information about merges will be printed to this.

Link copied to clipboard

Returns an unmodifiable set of segments that are currently merging.

Functions

Link copied to clipboard

Adds a document to this index.

Link copied to clipboard

Atomically adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

Link copied to clipboard
fun addIndexes(vararg readers: CodecReader): Long

Merges the provided indexes into this index.

fun addIndexes(vararg dirs: Directory): Long

Adds all segments from an array of indexes into this index.

Link copied to clipboard

Runs a single merge operation for IndexWriter.addIndexes.

Link copied to clipboard

If SegmentInfos.getVersion is below newVersion then update it to this value.

Link copied to clipboard

Tests should use this method to snapshot the current segmentInfos to have a consistent view

Link copied to clipboard
open override fun close()

Closes all open resources and releases the write lock.

Link copied to clipboard
open override fun commit(): Long

Commits all pending changes (added and deleted documents, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. Note that this does not wait for any running background merges to finish. This may be a costly operation, so you should test the cost in your application and do it only when really necessary.

Link copied to clipboard
fun decRefDeleter(segmentInfos: SegmentInfos)

Record that the files referenced by this SegmentInfos are no longer in use. Only call this if you are sure you previously called .incRefDeleter.

Link copied to clipboard

Delete all documents in the index.

Link copied to clipboard
fun deleteDocuments(vararg terms: Term): Long

Deletes the document(s) containing any of the terms. All given deletes are applied and flushed atomically at the same time.

fun deleteDocuments(vararg queries: Query): Long

Deletes the document(s) matching any of the provided queries. All given deletes are applied and flushed atomically at the same time.

Link copied to clipboard

Expert: remove any index files that are no longer used.

Link copied to clipboard
fun ensureOpen(failIfClosing: Boolean = true)

Used internally to throw an AlreadyClosedException if this IndexWriter has been closed (closed=true) or is in the process of closing (closing=true).

Link copied to clipboard
Link copied to clipboard
fun flush()

Moves all in-memory segments to the Directory, but does not commit (fsync) them (call .commit for that).

fun flush(triggerMerge: Boolean, applyAllDeletes: Boolean)

Flush all in-memory buffered updates (adds and deletes) to the Directory.

Link copied to clipboard

Expert: Flushes the next pending writer per thread buffer if available or the largest active non-pending writer per thread buffer in the calling thread. This can be used to flush documents to disk outside of an indexing thread. In contrast to .flush this won't mark all currently active indexing buffers as flush-pending.

Link copied to clipboard

Translates a frozen packet of delete term/query, or doc values updates, into their actual docIDs in the index, and applies the change. This is a heavy operation and is done concurrently by incoming indexing threads.

Link copied to clipboard
fun forceMerge(maxNumSegments: Int, doWait: Boolean = true)

Forces merge policy to merge segments until there are <= maxNumSegments. The actual merges to be executed are determined by the MergePolicy.

Link copied to clipboard

Forces merging of all segments that have deleted documents. The actual merges to be executed are determined by the MergePolicy. For example, the default TieredMergePolicy will only pick a segment if the percentage of deleted docs is over 10%.

Link copied to clipboard

Returns the analyzer used by this index.

Link copied to clipboard

Returns the Directory used by this index.

Link copied to clipboard

Returns accurate DocStats for this writer. The numDoc for instance can change after maxDoc is fetched that causes numDocs to be greater than maxDoc which makes it hard to get accurate document stats from IndexWriter.

Link copied to clipboard
Link copied to clipboard

Return an unmodifiable set of all field names as visible from this IndexWriter, across all segments of the index.

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Returns the number of bytes currently being flushed

Link copied to clipboard

Returns the commit user data iterable previously set with .setLiveCommitData, or null if nothing has been set yet.

Link copied to clipboard

Returns the highest #sequence_number across all completed operations, or 0 if no operations have finished yet. Still in-flight operations (in other threads) are not counted until they finish.

Link copied to clipboard
Link copied to clipboard

Returns the number of documents in the index including documents are being added (i.e., reserved).

Link copied to clipboard
Link copied to clipboard
fun getReader(applyAllDeletes: Boolean, writeAllDeletes: Boolean): DirectoryReader

Expert: returns a readonly reader, covering all committed as well as un-committed changes to the index. This provides "near real-time" searching, in that changes made during an IndexWriter session can be quickly made available for searching without closing the writer nor calling .commit.

Link copied to clipboard
Link copied to clipboard

If this IndexWriter was closed as a side-effect of a tragic exception, e.g. disk full while flushing a new segment, this returns the root cause exception. Otherwise (no tragic exception has occurred) it returns null.

Link copied to clipboard

Returns true if there are any changes or deletes that are not flushed or applied.

Link copied to clipboard

Returns true if this index has deletions (including buffered deletions). Note that this will return true if there are buffered Term/Query deletions, even if it turns out those buffered deletions don't match any documents.

Link copied to clipboard

Expert: returns true if there are merges waiting to be scheduled.

Link copied to clipboard

Returns true if there may be changes that have not been committed. There are cases where this may return true when there are no actual "real" changes to the index, for example if you've deleted by Term or Query but that Term or Query does not match any documents. Also, if a merge kicked off as a result of flushing a new segment during .commit, or a concurrent merged finished, this method may return true right after you had just called .commit.

Link copied to clipboard
fun incRefDeleter(segmentInfos: SegmentInfos)

Record that the files referenced by this SegmentInfos are still in use.

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
fun isFullyDeleted(readersAndUpdates: ReadersAndUpdates): Boolean
Link copied to clipboard

Returns true if this IndexWriter is still open.

Link copied to clipboard
fun maxDoc(i: Int): Int
Link copied to clipboard

Expert: asks the mergePolicy whether any merges are necessary now and if so, runs the requested merges and then iterate (test again if merges are needed) until no more merges are returned by the mergePolicy.

Link copied to clipboard

Does initial setup for a merge, which is fast but holds the synchronized lock on IndexWriter instance.

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
open override fun numDeletedDocs(info: SegmentCommitInfo): Int

Obtain the number of deleted docs for a pooled reader. If the reader isn't being pooled, the segmentInfo's delCount is returned.

Link copied to clipboard
open override fun numDeletesToMerge(info: SegmentCommitInfo): Int

Returns the number of deletes a merge would claim back if the given segment is merged.

Link copied to clipboard

Expert: Return the number of documents currently buffered in RAM.

Link copied to clipboard
fun onTragicEvent(tragedy: Throwable, location: String)

This method should be called on a tragic event ie. if a downstream class of the writer hits an unrecoverable exception. This method does not rethrow the tragic event exception.

Link copied to clipboard
open override fun prepareCommit(): Long

Expert: prepare for commit. This does the first phase of 2-phase commit. This method does all steps necessary to commit changes since this writer was opened: flushes pending added and deleted docs, syncs the index files, writes most of next segments_N file. After calling this you must call either .commit to finish the commit, or .rollback to revert the commit and undo all changes done since the writer was opened.

Link copied to clipboard
open override fun ramBytesUsed(): Long

Return the memory usage of this object in bytes. Negative values are illegal.

Link copied to clipboard
fun release(readersAndUpdates: ReadersAndUpdates)
Link copied to clipboard
open override fun rollback()

Close the IndexWriter without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called). This removes any temporary files that had been created, after which the state of the index will be the same as it was when commit() was last called or when this writer was first opened. This also clears a previous call to .prepareCommit.

Link copied to clipboard

Returns a string description of all segments, for debugging.

Link copied to clipboard

Sets the iterator to provide the commit user data map at commit time. Calling this method is considered a committable change and will be .commit even if there are no other changes this writer. Note that you must call this method before .prepareCommit. Otherwise it won't be included in the follow-on .commit.

fun setLiveCommitData(commitUserData: Iterable<MutableMap.MutableEntry<String, String>>, doIncrementVersion: Boolean)

Sets the commit user data iterator, controlling whether to advance the SegmentInfos.getVersion.

Link copied to clipboard
fun softUpdateDocument(term: Term?, doc: Iterable<out IndexableField>, vararg softDeletes: Field): Long

Expert: Updates a document by first updating the document(s) containing term with the given doc-values fields and then adding the new document. The doc-values update and the subsequent addition are atomic, as seen by a reader on the same index (a flush may happen only after the addition).

Link copied to clipboard
fun softUpdateDocuments(term: Term?, docs: Iterable<Iterable<IndexableField>>, vararg softDeletes: Field): Long

Expert: Atomically updates documents matching the provided term with the given doc-values fields and adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

Link copied to clipboard
Link copied to clipboard

Translates a frozen packet of delete term/query, or doc values updates, into their actual docIDs in the index, and applies the change. This is a heavy operation and is done concurrently by incoming indexing threads. This method will return immediately without blocking if another thread is currently applying the package. In order to ensure the packet has been applied, IndexWriter.forceApply must be called.

Link copied to clipboard
fun tryDeleteDocument(readerIn: IndexReader, docID: Int): Long

Expert: attempts to delete by document ID, as long as the provided reader is a near-real-time reader (from DirectoryReader.open). If the provided reader is an NRT reader obtained from this writer, and its segment has not been merged away, then the delete succeeds and this method returns a valid (> 0) sequence number; else, it returns -1 and the caller must then separately delete by Term or Query.

Link copied to clipboard
fun tryUpdateDocValue(readerIn: IndexReader, docID: Int, vararg fields: Field): Long

Expert: attempts to update doc values by document ID, as long as the provided reader is a near-real-time reader (from DirectoryReader.open). If the provided reader is an NRT reader obtained from this writer, and its segment has not been merged away, then the update succeeds and this method returns a valid (> 0) sequence number; else, it returns -1 and the caller must then either retry the update and resolve the document again. If a doc values fields data is null the existing value is removed from all documents matching the term. This can be used to un-delete a soft-deleted document since this method will apply the field update even if the document is marked as deleted.

Link copied to clipboard
fun updateBinaryDocValue(term: Term, field: String, value: BytesRef): Long

Updates a document's BinaryDocValues for field to the given value * . You can only update fields that already exist in the index, not add new fields through this method. You can only update fields that were indexed only with doc values.

Link copied to clipboard

Updates a document by first deleting the document(s) containing term and then adding the new document. The delete and then add are atomic as seen by a reader on the same index (flush may happen only after the add).

Link copied to clipboard

Atomically deletes documents matching the provided delTerm and adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents.

Similar to .updateDocuments, but take a query instead of a term to identify the documents to be updated

Link copied to clipboard
fun updateDocValues(term: Term, vararg updates: Field): Long

Updates documents' DocValues fields to the given values. Each field update is applied to the set of documents that are associated with the Term to the same value. All updates are atomically applied and flushed together. If a doc values fields data is null the existing value is removed from all documents matching the term.

Link copied to clipboard
fun updateNumericDocValue(term: Term, field: String, value: Long): Long

Updates a document's NumericDocValues for field to the given value * . You can only update fields that already exist in the index, not add new fields through this method. You can only update fields that were indexed with doc values only.

Link copied to clipboard

Wait for any currently outstanding merges to finish.

Link copied to clipboard