BytesRefHash

class BytesRefHash(pool: ByteBlockPool, capacity: Int, bytesStartArray: BytesRefHash.BytesStartArray) : Accountable

BytesRefHash is a special purpose hash-map like data-structure optimized for [ ] instances. BytesRefHash maintains mappings of byte arrays to ids (Map) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

Note: The maximum capacity BytesRef instance passed to .add must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

Constructors

Link copied to clipboard
constructor(pool: ByteBlockPool, capacity: Int, bytesStartArray: BytesRefHash.BytesStartArray)
constructor(pool: ByteBlockPool = ByteBlockPool(DirectAllocator()))

Creates a new BytesRefHash with a ByteBlockPool using a [ ].

Types

Link copied to clipboard
abstract class BytesStartArray

Manages allocation of the per-term addresses.

Link copied to clipboard
object Companion
Link copied to clipboard
open class DirectBytesStartArray @JvmOverloads constructor(initSize: Int, bytesUsed: Counter = Counter.newCounter()) : BytesRefHash.BytesStartArray

A simple BytesStartArray that tracks memory allocation using a private Counter instance.

Link copied to clipboard

Thrown if a BytesRef exceeds the BytesRefHash limit of -2.

Properties

Link copied to clipboard
Link copied to clipboard

Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).

Link copied to clipboard

Functions

Link copied to clipboard
fun add(bytes: BytesRef): Int

Adds a new BytesRef

Link copied to clipboard
fun addByPoolOffset(offset: Int): Int

Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.

Link copied to clipboard
fun byteStart(bytesID: Int): Int

Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID

Link copied to clipboard
fun clear(resetPool: Boolean = true)

Clears the BytesRef which maps to the given BytesRef

Link copied to clipboard
fun close()

Closes the BytesRefHash and releases all internally used memory

Link copied to clipboard

Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit of .size - 1

Link copied to clipboard
fun find(bytes: BytesRef): Int

Returns the id of the given BytesRef.

Link copied to clipboard
fun get(bytesID: Int, ref: BytesRef): BytesRef

Populates and returns a BytesRef with the bytes for the given bytesID.

Link copied to clipboard
open override fun ramBytesUsed(): Long

Return the memory usage of this object in bytes. Negative values are illegal.

Link copied to clipboard
fun reinit()

reinitializes the BytesRefHash after a previous .clear call. If .clear has not been called previously this method has no effect.

Link copied to clipboard
fun size(): Int

Returns the number of BytesRef values in this BytesRefHash.

Link copied to clipboard
fun sort(): IntArray

Returns the values array sorted by the referenced byte values.