FuzzySet
A class used to represent a set of many, potentially large, values (e.g. many long strings such as URLs), using a significantly smaller amount of memory.
The set is "lossy" in that it cannot definitively state that is does contain a value but it can definitively say if a value is not in the set. It can therefore be used as a Bloom Filter. Another application of the set is that it can be used to perform fuzzy counting because it can estimate reasonably accurately how many unique values are contained in the set.
This class is NOT threadsafe.
Internally a Bitset is used to record values and once a client has finished recording a stream of values the .downsize method can be used to create a suitably smaller set that is sized appropriately for the number of values recorded and desired saturation levels.
Types
Result from FuzzySet.contains: can never return definitively YES (always MAYBE), but can sometimes definitely return NO.
Properties
Functions
Records a value in the set. The referenced bytes are hashed. From the 64-bit generated hash, two 32-bit hashes are derived from the msb and lsb which can be used to derive more hashes (see https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf). Finally, each generated hash is modulo n'd where n is the chosen size of the internal bitset.
The main method required for a Bloom filter which, given a value determines set membership. Unlike a conventional set, the fuzzy set returns NO or MAYBE rather than true or false. Hash generation follows the same principles as .addValue
Return the memory usage of this object in bytes. Negative values are illegal.
Serializes the data set to file using the following format: