CollectionStatistics
Contains statistics for a collection (field).
This class holds statistics across all documents for scoring purposes:
.maxDoc: number of documents.
.docCount: number of documents that contain this field.
.sumDocFreq: number of postings-list entries.
.sumTotalTermFreq: number of tokens.
The following conditions are always true:
All statistics are positive integers: never zero or negative.
docCount<=maxDocdocCount<=sumDocFreq<=sumTotalTermFreq
Values may include statistics on deleted documents that have not yet been merged away.
Be careful when performing calculations on these values because they are represented as 64-bit integer values, you may need to cast to double for your use.
Parameters
Field's name.
This value is never null.
The total number of documents in the range [1 .. Long.MAX_VALUE], regardless of whether they all contain values for this field.
This value is always a positive number. @see IndexReader#maxDoc()
The total number of documents that have at least one term for this field , in the range [1 .. .maxDoc].
This value is always a positive number, and never exceeds .maxDoc. @see Terms#getDocCount()
The total number of tokens for this field , in the range [.sumDocFreq .. Long.MAX_VALUE]. This is the "word count" for this field across all documents. It is the sum of TermStatistics.totalTermFreq across all terms. It is also the sum of each document's field length across all documents.
This value is always a positive number, and always at least .sumDocFreq. @see Terms#getSumTotalTermFreq()
The total number of posting list entries for this field, in the range [.docCount .. .sumTotalTermFreq]. This is the sum of term-document pairs: the sum of TermStatistics.docFreq across all terms. It is also the sum of each document's unique term count for this field across all documents.
This value is always a positive number, always at least .docCount, and never exceeds .sumTotalTermFreq. @see Terms#getSumDocFreq()