core/org.gnit.lucenekmp.index/LogMergePolicy

LogMergePolicy

abstract class LogMergePolicy : MergePolicy

This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using .getMergeFactor and .setMergeFactor respectively.

This class is abstract and requires a subclass to define the .size method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.

NOTE: This policy returns natural merges whose size is below the .minMergeSize for .findFullFlushMerges.

Inheritors

LogByteSizeMergePolicy

LogDocMergePolicy

Constructors

LogMergePolicy

constructor()

Types

Companion

object Companion

Properties

calibrateSizeByDeletes

var calibrateSizeByDeletes: Boolean

If true, we pro-rate a segment's size by the percentage of non-deleted documents.

maxCFSSegmentSizeMB

open var maxCFSSegmentSizeMB: Double

maxMergeDocs

var maxMergeDocs: Int

If a segment has more than this many documents then it will never be merged.

maxMergeSize

var maxMergeSize: Long

If the size of a segment exceeds this value then it will never be merged.

mergeFactor

var mergeFactor: Int

How many segments to merge at a time.

minMergeSize

var minMergeSize: Long

Any segments whose size is smaller than this value will be candidates for full-flush merges and merged more aggressively.

noCFSRatio

open var noCFSRatio: Double

targetSearchConcurrency

var targetSearchConcurrency: Int

Target search concurrency. This merge policy will avoid creating segments that have more than maxDoc / targetSearchConcurrency documents.

Functions

findForcedDeletesMerges

open override fun findForcedDeletesMerges(segmentInfos: SegmentInfos?, mergeContext: MergePolicy.MergeContext?): MergePolicy.MergeSpecification

Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

findForcedMerges

open override fun findForcedMerges(infos: SegmentInfos?, maxNumSegments: Int, segmentsToMerge: MutableMap<SegmentCommitInfo, Boolean>?, mergeContext: MergePolicy.MergeContext?): MergePolicy.MergeSpecification?

Returns the merges necessary to merge the index down to a specified number of segments. This respects the .maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.

findFullFlushMerges

open fun findFullFlushMerges(mergeTrigger: MergeTrigger, segmentInfos: SegmentInfos, mergeContext: MergePolicy.MergeContext): MergePolicy.MergeSpecification?

Identifies merges that we want to execute (synchronously) on commit. By default, this will return .findMerges whose segments are all less than the .maxFullFlushMergeSize.

findMerges

open override fun findMerges(mergeTrigger: MergeTrigger?, infos: SegmentInfos?, mergeContext: MergePolicy.MergeContext?): MergePolicy.MergeSpecification?

Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than .setMergeFactor segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.

open fun findMerges(vararg readers: CodecReader): MergePolicy.MergeSpecification

Define the set of merge operations to perform on provided codec readers in .

keepFullyDeletedSegment

open fun keepFullyDeletedSegment(readerIOSupplier: IOSupplier<CodecReader>): Boolean

Returns true if the segment represented by the given CodecReader should be kept even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.

maxFullFlushMergeSize

open override fun maxFullFlushMergeSize(): Long

Return the maximum size of segments to be included in full-flush merges by the default implementation of .findFullFlushMerges.

numDeletesToMerge

open fun numDeletesToMerge(info: SegmentCommitInfo, delCount: Int, readerSupplier: IOSupplier<CodecReader>): Int

Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.

size

open fun size(info: SegmentCommitInfo, mergeContext: MergePolicy.MergeContext): Long

Return the byte size of the provided SegmentCommitInfo, prorated by percentage of non-deleted documents.

toString

open override fun toString(): String

useCompoundFile

open fun useCompoundFile(infos: SegmentInfos, mergedInfo: SegmentCommitInfo, mergeContext: MergePolicy.MergeContext): Boolean

Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returns true iff the size of the given mergedInfo is less or equal to .getMaxCFSSegmentSizeMB and the size is less or equal to the TotalIndexSize * .getNoCFSRatio otherwise false.