Package-level declarations
Types
A FilteredTermsEnum that enumerates terms based upon what is accepted by a DFA.
Base class for implementing CompositeReaders based on an array of sub-readers. The implementing class has to add code for correctly refcounting and closing the sub-readers.
A base TermsEnum that adds default implementations for
A per-document numeric value.
Holds buffered deletes and updates, by docID, term or query for a single segment. This is used to hold buffered pending deletes and updates against the to-be-flushed segment. Once the deletes and updates are pushed (on flush in DocumentsWriter), they are converted to a [ ] instance and pushed to the BufferedUpdatesStream.
Tracks the stream of FrozenBufferedUpdates. When DocumentsWriterPerThread flushes, its buffered deletes and updates are appended to this stream and immediately resolved (to actual docIDs, per segment) using the indexing thread that triggered the flush for concurrency. When a merge kicks off, we sync to ensure all resolving packets complete. We also apply to all segments when NRT reader is pulled, commit/close is called, or when too many deletes or updates are buffered and must be flushed (by RAM usage or by count).
IndexInput that knows how to read the byte slices written by Posting and PostingVector. We read the bytes in each slice until we hit the end of that slice at which point we read the forwarding address of the next slice and then jump to it.
This class provides access to per-document floating point vector values indexed as [ ].
Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.
LeafReader implemented by codec APIs.
Instances of this reader type can only be used to get stored fields from the underlying LeafReaders, but it is not possible to directly retrieve postings. To do that, get the [ ] for all sub-readers via .leaves.
IndexReaderContext for CompositeReader instance.
A MergeScheduler that runs each merge using a separate thread.
This exception is thrown when Lucene detects an inconsistency in the index.
DirectoryReader is an implementation of CompositeReader that can read indexes in a [ ].
Utility class to help merging documents from sub-readers according to either simple concatenated (unsorted) order, or by a specified index-time sort, skipping deleted documents and remapping non-deleted documents.
Accumulator for documents that have a value for a field. This is optimized for the case that all documents have a value.
This class accepts multiple added documents and directly writes segment files.
DocumentsWriterDeleteQueue is a non-blocking linked pending deletes queue. In contrast to other queue implementation we only maintain the tail of the queue. A delete queue is always used in a context of a set of DWPTs and a global delete pool. Each of the DWPT and the global pool need to maintain their 'own' head of the queue (as a DeleteSlice instance per [ ]). The difference between the DWPT and the global pool is that the DWPT starts maintaining a head once it has added its first document since for its segments private deletes only the deletes after that document are relevant. The global pool instead starts maintaining the head once this instance is created by taking the sentinel instance as its initial head.
This class controls DocumentsWriterPerThread flushing during indexing. It tracks the memory consumption per DocumentsWriterPerThread and uses a configured FlushPolicy to decide if a DocumentsWriterPerThread must flush.
DocumentsWriterPerThreadPool controls DocumentsWriterPerThread instances and their thread assignments during indexing. Each DocumentsWriterPerThread is, once obtained from the pool, exclusively used for indexing a single document or list of documents by the obtaining thread. Each indexing thread must obtain such a DocumentsWriterPerThread to make progress. Depending on the DocumentsWriterPerThreadPool implementation [ ] assignments might differ from document to document.
Controls the health status of a DocumentsWriter sessions. This class used to block incoming indexing threads if flushing significantly slower than indexing to ensure the [ ]s healthiness. If flushing is significantly slower than indexing the net memory used within an IndexWriter session can increase very quickly and easily exceed the JVM's available memory.
Holds updates of a single DocValues field, for a set of documents within one segment.
Options for skip indexes on doc values.
Skipper for DocValues.
DocValues types. Note that DocValues is strongly typed, so a field cannot have different types across different documents.
An in-place update to a DocValues field.
Abstract base class implementing a DocValuesProducer that has no doc values.
The ExitableDirectoryReader wraps a real index DirectoryReader and allows for a QueryTimeout implementation object to be checked periodically to see if the thread should exit or not. If QueryTimeout.shouldExit returns true, an ExitingReaderException is thrown.
Access to the Field Info file that describes document fields and whether or not they are indexed. Each segment has a separate Field Info file. Objects of this class are thread-safe for multiple readers, but only one thread can be adding documents at a time, with no other reader or writer threads accessing this object.
This class tracks the number and position / offset parameters of terms being added to the index. The information collected in this class is also used to calculate the normalization factor for a field.
Iterates over terms in across multiple fields. The caller must check .field after each .next to see if the field changed, but == can be used since the iterator implementation ensures it will use the same String instance for a given field.
This class efficiently buffers numeric and binary field updates and stores terms, values and metadata in a memory efficient way without creating large amounts of objects. Update terms are stored without de-duplicating the update term. In general we try to optimize for several use-cases. For instance we try to use constant space for update terms field since the common case always updates on the same field. Also for docUpTo we try to optimize for the case when updates should be applied to all docs ie. docUpTo=Integer.MAX_VALUE. In other cases each update will likely have a different docUpTo. Along the same lines this impl optimizes the case when all updates have a value. Lastly, if all updates share the same value for a numeric field we only store the value once.
Delegates all methods to a wrapped BinaryDocValues.
A FilterCodecReader contains another CodecReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.
A FilterDirectoryReader wraps another DirectoryReader, allowing implementations to transform or extend it.
Abstract class for enumerating a subset of all terms.
A FilterLeafReader contains another LeafReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class FilterLeafReader itself simply implements all abstract methods of IndexReader with versions that pass all requests to the contained index reader. Subclasses of FilterLeafReader may further override some of these methods and may also provide additional methods and fields.
A wrapper for MergePolicy instances.
Delegates all methods to a wrapped NumericDocValues.
Delegates all methods to a wrapped SortedDocValues.
Delegates all methods to a wrapped SortedNumericDocValues.
Delegates all methods to a wrapped SortedSetDocValues.
This class provides access to per-document floating point vector values indexed as [ ].
FlushPolicy controls when segments are flushed from a RAM resident internal data-structure to the IndexWriters Directory.
Holds buffered deletes and updates by term or query, once pushed. Pushed deletes/updates are write-once, so we shift to more memory efficient data structure to hold them. We don't hold docIDs because these are applied on flush.
Extension of PostingsEnum which also provides information about upcoming impacts.
Source of Impacts.
Represents a single field for indexing. IndexWriter consumes Iterable
Describes the properties of a field.
Expert: represents a single commit into an index as seen by the IndexDeletionPolicy or IndexReader.
Expert: policy for deletion of stale index commits.
This class contains useful constants representing filenames and extensions used by lucene, as well as convenience methods for querying whether a file name matches an extension (.matchesExtension), as well as generating file names from a segment name, generation and extension ( .fileNameFromGeneration, .segmentFileName).
This exception is thrown when Lucene detects an index that is newer than this Lucene version.
This exception is thrown when Lucene detects an index that is too old for this Lucene version
Default general purpose indexing chain, which handles indexing of all types of fields.
Signals that no index was found in the Directory. Possibly because the directory is empty, however can also indicate an index corruption.
Controls how much information is stored in the postings lists.
IndexReader is an abstract class, providing an interface for accessing a point-in-time view of an index. Any changes made to the index via IndexWriter will not be visible until a new IndexReader is opened. It's best to use DirectoryReader.open to obtain an IndexReader, if your IndexWriter is in-process. When you need to re-open to see changes to the index, it's best to use since the new reader will share resources with the previous one when possible. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.
A struct-like class that represents a hierarchical relationship between IndexReader instances.
Handles how documents should be sorted in an index, both within a segment and between segments.
An IndexWriter creates and maintains an index.
Holds all the configuration that is used to create an IndexWriter. Once [ ] has been created with this object, changes to this object will not affect the [ ] instance. For that, use LiveIndexWriterConfig that is returned from IndexWriter.getConfig.
A callback event listener for recording key events happened inside IndexWriter
This IndexDeletionPolicy implementation that keeps only the most recent commit and immediately removes all prior commits after a new commit is done. This is the default deletion policy.
This class abstracts addressing of document vector values indexed as KnnFloatVectorField or KnnByteVectorField.
Provides read-only metadata about a leaf.
LeafReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable. IndexReaders implemented by this subclass do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings.
IndexReaderContext for LeafReader instances.
Holds all the configuration used by IndexWriter with few setters for settings that can be changed on an IndexWriter instance "live".
This is a LogMergePolicy that measures size of a segment as the total byte size of the segment's files.
This is a LogMergePolicy that measures size of a segment as the number of documents (not taking deletions into account).
This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using .getMergeFactor and .setMergeFactor respectively.
A Fields implementation that merges multiple Fields into one, and maps around deleted documents. This is used for merging.
Expert: a MergePolicy determines the sequence of primitive merge operations.
This is the RateLimiter that IndexWriter assigns to each running merge, to give MergeSchedulers ionice like control.
Expert: IndexWriter uses an instance implementing this interface to execute the merges selected by a MergePolicy. The default MergeScheduler is [ ].
Holds common state used during segment merging.
MergeTrigger is passed to MergePolicy.findMerges to indicate the event that triggered the merge.
A wrapper for CompositeIndexReader providing access to DocValues.
Provides a single Fields term index view over an IndexReader. This is useful when you're interacting with an IndexReader implementation that consists of sequential sub-readers (eg DirectoryReader or MultiReader) and you must treat it as a [ ].
Exposes PostingsEnum, merged from PostingsEnum API of sub-segments.
A CompositeReader which reads multiple indexes, appending their content. It can be used to create a view on several sub-readers (like DirectoryReader) and execute searches on it.
Exposes flex API, merged from flex API of sub-segments.
An IndexDeletionPolicy which keeps all index commits around, never deleting them. This class is a singleton and can be accessed by referencing .INSTANCE.
A MergePolicy which never returns merges to execute. Use it if you want to prevent segment merges.
A MergeScheduler which never executes any merges. It is also a singleton and can be accessed through NoMergeScheduler.INSTANCE. Use it if you want to prevent an IndexWriter from ever executing merges, regardless of the MergePolicy used. Note that you can achieve the same thing by using NoMergePolicy, however with NoMergeScheduler you also ensure that no unnecessary code of any MergeScheduler implementation is ever executed. Hence it is recommended to use both if you want to disable merges from ever happening.
A per-document numeric value.
A wrapping merge policy that wraps the MergePolicy.OneMerge objects returned by the wrapped merge policy.
Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints representation.
An ordinal based TermState
An CompositeReader which reads multiple, parallel indexes. Each index added must have the same number of documents, and exactly the same number of leaves (with equal maxDoc), but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field.
An LeafReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field.
This class handles accounting and applying pending deletes for live segment readers
A SnapshotDeletionPolicy which adds a persistence layer so that snapshots can be maintained across the life of an application. The snapshots are persisted in a Directory and are committed as soon as snapshot or release is called.
Access to indexed numeric values.
Iterates through the postings. NOTE: you must first call .nextDoc before using any of the per-doc methods.
Prefix codes term instances (prefixes are shared). This is expected to be faster to build than a FST and might also be more compact if there are no common suffixes.
Query timeout abstraction that controls whether a query should continue or be stopped. Can be set to the searcher through org.gnit.lucenekmp.search.IndexSearcher.setTimeout, in which case bulk scoring will be time-bound. Can also be used in combination with [ ].
An implementation of QueryTimeout that can be used by the ExitableDirectoryReader class to time out and exit out when a query takes a long time to rewrite.
Utility class to safely share DirectoryReader instances across multiple threads, while periodically reopening. This class ensures each reader is closed only once all threads have finished using it.
Subreader slice from a parent composite reader.
Common util methods for dealing with IndexReaders and IndexReaderContexts.
Embeds a read-only SegmentInfo and adds per-commit fields.
Holds core readers that are shared (unchanged) when SegmentReader is cloned or reopened
Manages the DocValuesProducer held by SegmentReader and keeps track of their reference counting.
Information about a segment such as its name, directory, and files related to the segment.
A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.
IndexReader implementation over a single segment.
Holder class for common parameters used during read.
Holder class for common parameters used during write.
A MergeScheduler that simply does each merge sequentially, using the current thread.
A very simple merged segment warmer that just ensures data structures are initialized.
Subclass of FilteredTermsEnum for enumerating a single term.
Wraps arbitrary readers for merging. Note that this can cause slow and memory-intensive merges. Consider using FilterCodecReader instead.
ImpactsEnum that doesn't index impacts but implements the API in a legal way. This is typically used for short postings that do not need skipping.
An IndexDeletionPolicy that wraps any other IndexDeletionPolicy and adds the ability to hold and later release snapshots of an index. While a snapshot is held, the [ ] will not remove any files associated with it even if the index is otherwise being actively, arbitrarily changed. Because we wrap another arbitrary IndexDeletionPolicy, this gives you the freedom to continue using whatever IndexDeletionPolicy you would normally want to use with your index.
This reader filters out documents that have a doc-values value in the given field and treats these documents as soft-deleted. Hard deleted documents will also be filtered out in the live docs of this reader.
This MergePolicy allows to carry over soft deleted documents across merges. The policy wraps the merge reader and marks documents as "live" that have a value in the soft delete field and match the provided query. This allows for instance to keep documents alive based on time or any other constraint in the index. The main purpose for this merge policy is to implement retention policies for document modification to vanish in the index. Using this merge policy allows to control when soft deletes are claimed by merges.
A per-document byte[] with presorted values. This is fundamentally an iterator over the int ord values per document, with random access APIs to resolve an int ord to BytesRef.
A list of per-document numeric values, sorted according to Long.compare.
A multi-valued version of SortedDocValues.
Reads/Writes a named SortField from a segment info file, used to record index sorts
An CodecReader which supports sorting documents by a given [ ]. This can be used to re-sort and index after it's been created by wrapping all readers of the index with this reader and adding it to a fresh IndexWriter via . NOTE: This reader should only be used for merging. Pulling fields from this reader might be very costly and memory intensive.
Default implementation of DirectoryReader.
A fixed size DataInput which includes the length of the input. For use as a StoredField.
API for reading stored fields.
A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in.
Iterator to seek (.seekCeil, .seekExact) or step through (.next terms to obtain frequency information (.docFreq), PostingsEnum or PostingsEnum for the current term (.postings.
This class is passed each token produced by the analyzer on each field during indexing, and it stores these tokens in a hash table, and allocates separate byte streams per token. Consumers of this class, eg FreqProxTermsWriter and TermVectorsConsumer, write their own byte streams under each term.
This class stores streams of information per term without knowing the size of the stream ahead of time. Each stream typically encodes one level of information like term frequency per document or term proximity. Internally this class allocates a linked list of slices that can be read by a ByteSliceReader for each term. Terms are first deduplicated in a BytesRefHash once this is done internal data-structures point to the current offset of each stream that can be written to.
Maintains a IndexReader view over IndexReader instances containing a single term. The TermStates doesn't track if the given TermState objects are valid, neither if the TermState instances refer to the same terms in the associated readers.
API for reading term vectors.
Merges segments of approximately equal size, subject to an allowed number of segments per tier. This is similar to LogByteSizeMergePolicy, except this merge policy is able to merge non-adjacent segment. This merge policy also does not over-merge (i.e. cascade merges).
An interface for implementations that support 2-phase commit. You can use [ ] to execute a 2-phase commit algorithm over several TwoPhaseCommits.
A utility for executing 2-phase commit on several objects.
This MergePolicy is used for upgrading all existing segments of an index when calling IndexWriter.forceMerge. All other methods delegate to the base MergePolicy given to the constructor. This allows for an as-cheap-as possible upgrade of an older index by only upgrading segments that are created by previous Lucene versions. forceMerge does no longer really merge; it is just used to "forceMerge" older segment versions away.
The numeric datatype of the vector values.
Vector similarity function; used in search to return top K most similar vectors to a target vector. This is a label describing the method used during indexing and searching of the vectors in order to determine the nearest neighbors.
Streams vector values for indexing to the given codec's vectors writer. The codec's vectors writer is responsible for buffering and processing vectors.