LogByteSizeMergePolicy
This is a LogMergePolicy that measures size of a segment as the total byte size of the segment's files.
Properties
If true, we pro-rate a segment's size by the percentage of non-deleted documents.
If a segment has more than this many documents then it will never be merged.
If the size of a segment exceeds this value then it will never be merged.
How many segments to merge at a time.
Any segments whose size is smaller than this value will be candidates for full-flush merges and merged more aggressively.
Target search concurrency. This merge policy will avoid creating segments that have more than maxDoc / targetSearchConcurrency documents.
Functions
Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.
Returns the merges necessary to merge the index down to a specified number of segments. This respects the .maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.
Identifies merges that we want to execute (synchronously) on commit. By default, this will return .findMerges whose segments are all less than the .maxFullFlushMergeSize.
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than .setMergeFactor segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.
Returns true if the segment represented by the given CodecReader should be kept even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.
Return the maximum size of segments to be included in full-flush merges by the default implementation of .findFullFlushMerges.
Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.
Return the byte size of the provided SegmentCommitInfo, prorated by percentage of non-deleted documents.
Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returns true iff the size of the given mergedInfo is less or equal to .getMaxCFSSegmentSizeMB and the size is less or equal to the TotalIndexSize * .getNoCFSRatio otherwise false.