LogMergePolicy
This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using .getMergeFactor and .setMergeFactor respectively.
This class is abstract and requires a subclass to define the .size method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.
NOTE: This policy returns natural merges whose size is below the .minMergeSize for .findFullFlushMerges.
Inheritors
Properties
If true, we pro-rate a segment's size by the percentage of non-deleted documents.
If a segment has more than this many documents then it will never be merged.
If the size of a segment exceeds this value then it will never be merged.
How many segments to merge at a time.
Any segments whose size is smaller than this value will be candidates for full-flush merges and merged more aggressively.
Target search concurrency. This merge policy will avoid creating segments that have more than maxDoc / targetSearchConcurrency documents.
Functions
Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.
Returns the merges necessary to merge the index down to a specified number of segments. This respects the .maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.
Identifies merges that we want to execute (synchronously) on commit. By default, this will return .findMerges whose segments are all less than the .maxFullFlushMergeSize.
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than .setMergeFactor segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.
Returns true if the segment represented by the given CodecReader should be kept even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.
Return the maximum size of segments to be included in full-flush merges by the default implementation of .findFullFlushMerges.
Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.
Return the byte size of the provided SegmentCommitInfo, prorated by percentage of non-deleted documents.
Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returns true iff the size of the given mergedInfo is less or equal to .getMaxCFSSegmentSizeMB and the size is less or equal to the TotalIndexSize * .getNoCFSRatio otherwise false.