Lucene94FieldInfosFormat

Lucene 9.0 Field Infos format.

Field names are stored in the field info file, with suffix .fnm.

FieldInfos (.fnm) --> Header,FieldsCount, FieldsCount,Footer

Data types:

  • Header -->IndexHeader

  • FieldsCount -->DataOutput.writeVInt

  • FieldName -->DataOutput.writeString

  • FieldBits, IndexOptions, DocValuesBits -->DataOutput.writeByte

  • FieldNumber, DimensionCount, DimensionNumBytes -->DataOutput.writeInt

  • Attributes -->DataOutput.writeMapOfStrings

  • DocValuesGen -->DataOutput.writeLong

  • Footer -->CodecFooter

Field Descriptions:

  • FieldsCount: the number of fields in this file.

  • FieldName: name of the field as a UTF-8 String.

  • FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly.

  • FieldBits: a byte containing field options.

  • The low order bit (0x1) is one for fields that have term vectors stored, and zero for fields without term vectors.

  • If the second lowest order-bit is set (0x2), norms are omitted for the indexed field.

  • If the third lowest-order bit is set (0x4), payloads are stored for the indexed field.

  • IndexOptions: a byte containing index options.

  • 0: not indexed

  • 1: indexed as DOCS_ONLY

  • 2: indexed as DOCS_AND_FREQS

  • 3: indexed as DOCS_AND_FREQS_AND_POSITIONS

  • 4: indexed as DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

  • DocValuesBits: a byte containing per-document value types. The type recorded as two four-bit integers, with the high-order bits representing norms options, and the low-order bits representing DocValues options. Each four-bit integer can be decoded as such:

  • 0: no DocValues for this field.

  • 1: NumericDocValues. (DocValuesType.NUMERIC)

  • 2: BinaryDocValues. (DocValuesType#BINARY)

  • 3: SortedDocValues. (DocValuesType#SORTED)

  • DocValuesGen is the generation count of the field's DocValues. If this is -1, there are no DocValues updates to that field. Anything above zero means there are updates stored by DocValuesFormat.

  • Attributes: a key-value map of codec-private attributes.

  • PointDimensionCount, PointNumBytes: these are non-zero only if the field is indexed as points, e.g. using org.gnit.lucenekmp.document.LongPoint

  • VectorDimension: it is non-zero if the field is indexed as vectors.

  • VectorEncoding: a byte containing the encoding of vector values:

  • 0: BYTE. Samples are stored as signed bytes

  • 1: FLOAT32. Samples are stored in IEEE 32-bit floating point format.

  • VectorSimilarityFunction: a byte containing distance function used for similarity calculation.

  • 0: EUCLIDEAN distance. (VectorSimilarityFunction.EUCLIDEAN)

  • 1: DOT_PRODUCT similarity. (VectorSimilarityFunction.DOT_PRODUCT)

  • 2: COSINE similarity. (VectorSimilarityFunction.COSINE)

  • 3: MAXIMUM_INNER_PRODUCT similarity. ( )

Constructors

Link copied to clipboard
constructor()

Types

Link copied to clipboard
object Companion

Functions

Link copied to clipboard
open override fun read(directory: Directory, segmentInfo: SegmentInfo, segmentSuffix: String, context: IOContext): FieldInfos

Read the FieldInfos previously written with .write.

Link copied to clipboard
open override fun write(directory: Directory, segmentInfo: SegmentInfo, segmentSuffix: String, infos: FieldInfos, context: IOContext)

Writes the provided FieldInfos to the directory.