core/org.gnit.lucenekmp.util.quantization/ScalarQuantizer

ScalarQuantizer

class ScalarQuantizer(minQuantile: Float, maxQuantile: Float, bits: Byte)

Will scalar quantize float vectors into int8 byte values. This is a lossy transformation. Scalar quantization works by first calculating the quantiles of the float vector values. The quantiles are calculated using the configured confidence interval. The minQuantile, maxQuantile are then used to scale the values into the range 0, 127 and bucketed into the nearest byte values.

How Scalar Quantization Works

The basic mathematical equations behind this are fairly straight forward and based on min/max normalization. Given a float vector v and a confidenceInterval q we can calculate the quantiles of the vector values minQuantile, maxQuantile.

byte = (float - minQuantile) * 127/(maxQuantile - minQuantile)
float = (maxQuantile - minQuantile)/127 * byte + minQuantile

This then means to multiply two float values together (e.g. dot_product) we can do the following:

float1 * float2 ~= (byte1 * (maxQuantile - minQuantile)/127 + minQuantile) * (byte2 * (maxQuantile - minQuantile)/127 + minQuantile)
float1 * float2 ~= (byte1 * byte2 * (maxQuantile - minQuantile)^2)/(127^2) + (byte1 * minQuantile * (maxQuantile - minQuantile)/127) + (byte2 * minQuantile * (maxQuantile - minQuantile)/127) + minQuantile^2
let alpha = (maxQuantile - minQuantile)/127
float1 * float2 ~= (byte1 * byte2 * alpha^2) + (byte1 * minQuantile * alpha) + (byte2 * minQuantile * alpha) + minQuantile^2

The expansion for square distance is much simpler:

square_distance = (float1 - float2)^2
(float1 - float2)^2 ~= (byte1 * alpha + minQuantile - byte2 * alpha - minQuantile)^2
= (alpha*byte1 + minQuantile)^2 + (alpha*byte2 + minQuantile)^2 - 2*(alpha*byte1 + minQuantile)(alpha*byte2 + minQuantile)
this can be simplified to:
= alpha^2 (byte1 - byte2)^2