Package-level declarations
Types
AbstractKnnCollector is the default implementation for a knn collector used for gathering kNN results and providing topDocs from the gathered neighbors
Uses KnnVectorsReader.search to perform nearest neighbour search.
Contains functionality common to both MultiTermQueryConstantScoreBlendedWrapper and MultiTermQueryConstantScoreWrapper. Internal implementation detail only. Not meant as an extension point for users.
A Query that will match terms against a finite-state machine.
A Query that blends index statistics across multiple terms. This is particularly useful when several terms should produce identical scores, regardless of their index statistics.
A clause in a BooleanQuery.
A Query that matches documents matching boolean combinations of other queries, e.g. [ ]s, PhraseQuerys or other BooleanQuerys.
Add this Attribute to a TermsEnum returned by and update the boost on each returned term. This enables to control the boost factor for each matching term in or TopTermsRewrite mode. FuzzyQuery is using this to take the edit distance into account.
Implementation class for BoostAttribute.
A Query wrapper that allows to give a boost to the wrapped query. Boost values that are less than one will give less importance to this query compared to other ones while values that are greater than one will give more importance to the scores returned by this query.
This class is used to score a range of documents at once, and is returned by . Only queries that have a more optimized means of scoring across a range of documents need to override this. Otherwise, a default implementation is wrapped around the [ ] returned by Weight.scorer.
Search for all (approximate) byte vectors above a similarity threshold.
Caches all docs, and optionally also scores, coming from a search, and is then able to replay them to another collector. You specify the max RAM this class may use. Once the collection is done, call isCached. If this returns true, you can use replay against a new collector. If it returns false, this means too much RAM was required and you must instead re-run the original search.
Like IntConsumer, but may throw checked exceptions.
Throw this exception in LeafCollector.collect to prematurely terminate collection of the current leaf.
A manager of collectors. This class is useful to parallelize execution of search requests and has two main methods:
A Query that treats multiple fields as a single stream and scores terms as if they had been indexed in a single field whose values would be the union of the values of the provided fields.
Helper methods for building conjunction iterators
A query that wraps another query and simply returns a constant score equal to 1 for every document that matches the query. It therefore simply strips of all scores and always returns 1.
A constant-scoring Scorer.
Specialization of ScorerSupplier for queries that produce constant scores.
A Weight that has a constant score equal to the boost of the wrapped query. This is typically useful when building queries which do not produce meaningful scores and are mostly useful for filtering.
Utility class that runs a thread to manage periodicc reopens of a ReferenceManager, with methods to wait for a specific index changes to become visible. When a given search request needs to see a specific index change, call the {#waitForGeneration} to wait for that change to be visible. Note that this will only scale well if most searches do not need to wait for a specific index generation.
A priority queue of DocIdSetIterators that orders by current doc ID. This specialization is needed over PriorityQueue because the pluggable comparison function makes the rebalancing quite slow.
Wrapper used in DisiPriorityQueue.
A DocIdSetIterator which is a disjunction of the approximations of the provided iterators.
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries. This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as BooleanQuery would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery. The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields.
A DocIdSet contains a set of doc ids. Implementing classes must only implement .iterator to provide access to the set.
This abstract class defines methods to iterate over a set of non-decreasing doc ids. Note that this class assumes it iterates on doc Ids, and therefore .NO_MORE_DOCS is set to {@value
A stream of doc IDs. Most methods on DocIdStreams are terminal, meaning that the [ ] may not be further used.
Wrapper around a TwoPhaseIterator for a doc-values range query that speeds things up by taking advantage of a DocValuesSkipper.
Rewrites MultiTermQueries into a filter, using DocValues for term enumeration.
Per-segment, per-document double values, which can be calculated at search-time
Base class for producing DoubleValues
Expert: Find exact phrases
Expert: Describes the score computation for document and query.
Expert: a FieldComparator compares hits so as to determine their sort order when collecting the top results with TopFieldCollector. The concrete public FieldComparator classes here correspond to the SortField types.
Provides a FieldComparator for custom field sorting.
Expert: A ScoreDoc which also contains information about how to sort the referenced document. In addition to the document number and score, this object contains an array of values for the document from the field(s) used to sort. For example, if the sort criteria was to sort by fields "a", "b" then "c", the fields object array will have three elements, corresponding respectively to the term values for the document in fields "a", "b" and "c". The class of each element in the array will be either Integer, Float or String depending on the type of values in the terms of each field.
A Query that matches documents that contain either a KnnFloatVectorField, [ ] or a field that indexes norms or doc values.
Expert: A hit queue for sorting by hits by terms in more than one field.
Collector delegator.
Wrapper around a DocIdSetIterator.
Abstract decorator class of a DocIdSetIterator implementation that provides on-demand filter/validation mechanism on an underlying DocIdSetIterator.
LeafCollector delegator.
Filter a Scorable, intercepting methods and optionally changing their return values
A FilterScorer contains another Scorer, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class FilterScorer itself simply implements all abstract methods of Scorer with versions that pass all requests to the contained scorer. Subclasses of FilterScorer may further override some of these methods and may also provide additional methods and fields.
A FilterWeight contains another Weight and implements all abstract methods by delegating to the wrapped weight.
Search for all (approximate) float vectors above a similarity threshold.
Implements the fuzzy search query. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false to the transpositions parameter.
Subclass of TermsEnum for enumerating all terms that are similar to the specified filter term.
DocIdSetIterator that skips non-competitive docs thanks to the indexed impacts. Call .setMinCompetitiveScore in order to give this iterator the ability to skip low-scoring documents.
A query that uses either an index structure (points or terms) or doc values in order to run a query, depending which one is more efficient. This is typically useful for range queries, whose Weight.scorer is costly to create since it usually needs to sort large lists of doc ids. For instance, for a field that both indexed LongPoints and [ ]s with the same values, an efficient range query could be created by doing:
Implements search over a single IndexReader.
A range query that can take advantage of the fact that the index is sorted to speed up execution. If the index is sorted on the same field as the query, it performs binary search on the field's numeric doc values to find the documents at the lower and upper ends of the range.
A Query that matches documents matching combinations of subqueries.
Combines scores of subscorers. If a subscorer does not contain the docId, a smoothing score is calculated for that document/subscorer combination.
The Weight for IndriAndQuery, used to normalize, score and explain these queries.
The Indri implemenation of a disjunction scorer which stores the subscorers for the child queries. The score and smoothingScore methods use the list of all subscorers and not just the matches so that a smoothingScore can be calculated if there is not an exact match.
A Basic abstract query that all IndriQueries can extend to implement toString, equals, getClauses, and iterator.
The Indri parent scorer that stores the boost so that IndriScorers can use the boost outside of the term.
Uses KnnVectorsReader.search to perform nearest neighbour search.
KnnCollector is a knn collector used for gathering kNN results and providing topDocs from the gathered neighbors
Uses KnnVectorsReader.search to perform nearest neighbour search.
Collector decouples the score from the collected doc: the score computation is skipped entirely if it's not needed. Collectors that do need the score should implement the {@link #setScorer} method, to hold onto the passed {@link Scorer} instance, and call {@link Scorer#score()} within the collect method to compute the current hit's score. If your collector may request the score for a single hit multiple times, you should use {@link ScoreCachingWrappingScorer}.
Expert: comparator that gets instantiated on each leaf from a top-level FieldComparator instance.
Tracks live field values across NRT reader reopens. This holds a map for all updated ids since the last reader reopen. Once the NRT reader is reopened, it prunes the map. This means you must reopen your NRT reader periodically otherwise the RAM consumption of this class will grow unbounded!
Per-segment, per-document long values, which can be calculated at search-time
Base class for producing LongValues
A QueryCache that evicts queries using a LRU (least-recently-used) eviction policy in order to remain under a given maximum size and number of bytes used.
A query that matches all documents.
ScorerSupplier that matches all docs.
An iterator over match positions (and optionally offsets) for a single document and field
Contains static functions that aid the implementation of Matches and [ ] interfaces.
A query that matches no documents.
Add this Attribute to a fresh AttributeSource before calling . FuzzyQuery is using this to control its internal behaviour to only return competitive terms.
Implementation class for MaxNonCompetitiveBoostAttribute.
Maintains the maximum score and its corresponding document id concurrently
Compute maximum scores based on Impacts and keep them in a cache in order not to run expensive similarity score computations multiple times on the same data.
A composite CollectorManager which wraps a set of CollectorManager instances, akin to how MultiCollector wraps Collector instances.
Scorer that sums document's norms from multiple fields.
A generalized version of PhraseQuery, with the possibility of adding more than one term at the same position that are treated as a disjunction (OR). To use this class to search for the phrase "Microsoft app*" first create a Builder and use Builder.add on the term "microsoft" (assuming lowercase analysis), then find all terms that have "app" as prefix using LeafReader.terms, seeking to "app" then iterating and collecting terms until there is no longer that prefix, and finally use Builder.add to add them. returns the fully constructed (and immutable) MultiPhraseQuery.
An abstract Query that matches documents containing a subset of terms provided by a FilteredTermsEnum enumeration.
Utility class to help extract the set of sub queries that have matched from a larger query.
This is a PhraseQuery which is optimized for n-gram phrase query. For example, when you query "ABCD" on a 2-gram field, you may want to use NGramPhraseQuery rather than PhraseQuery, because NGramPhraseQuery will Query.rewrite the query to "AB/0 CD/2", while PhraseQuery will query "AB/0 BC/1 CD/2" (where term/position).
Base class for exact and sloppy phrase matching
Position of a term in a document that takes into account the term offset within the phrase.
A Query that matches documents containing a particular sequence of terms. A PhraseQuery is built by QueryParser for input like "new york".
Expert: Weight class for phrase matching
Abstract query class to find all documents whose single or multi-dimensional point values, previously indexed with e.g. IntPoint, is contained in the specified set.
Abstract class for range queries against single or multidimensional points such as [ ].
A Query that matches documents containing terms with a specified prefix. A PrefixQuery is built by QueryParser for input like app*.
Controls LeafFieldComparator how to skip documents
A cache for queries.
A policy defining which filters should be cached.
A Rescorer that uses a provided Query to assign scores to the first-pass hits.
Allows recursion through a query tree
Utility class to safely share instances of a certain type across multiple threads, while periodically refreshing them. This class ensures each reference is closed only once all threads have finished using it. It is recommended to consult the documentation of ReferenceManager implementations for their maybeRefresh semantics.
A fast regular expression query based on the org.gnit.lucenekmp.util.automaton package.
Re-scores the topN results (TopDocs) from an original query. See QueryRescorer for an actual implementation. Typically, you run a low-cost first-pass query across the entire index, collecting the top few hundred hits perhaps, and then use this class to mix in a more costly second pass scoring.
A Scorer which wraps another scorer and caches the score of the current document. Successive calls to .score will return the same result and will not invoke the wrapped Scorer's score() method, unless the current document has changed.
This class might be useful due to the changes done to the Collector interface, in which the score is not computed for a document by default, only if the collector requests it. Some collectors may need to use the score in several places, however all they have in hand is a [ ] object, and might end up computing the score of a document more than once.
Base rewrite method that translates each term into a query, and keeps the scores as computed by the query.
Factory class used by SearcherManager to create new IndexSearchers. The default implementation just creates an IndexSearcher with no custom behavior:
Keeps track of current plus old IndexSearchers, closing the old ones once they have timed out.
Utility class to safely share IndexSearcher instances across multiple threads, while periodically reopening. This class ensures each searcher is closed only once all threads have finished using it.
This is a version of knn vector query that provides a query seed to initiate the vector search. NOTE: The underlying format is free to ignore the provided seed
Interface defining whether or not an object can be cached against a LeafReader
Base Collector implementation that is used to collect all contexts.
Base FieldComparator implementation that is used for all contexts.
Find all slop-valid position-combinations (matches) encountered while traversing/hopping the PhrasePositions.
The sloppy frequency contribution of a match depends on the distance:
Selects a value from the document's list to use as the representative value
SortField for SortedNumericDocValues.
Selects a value from the document's set to use as the representative value
SortField for SortedSetDocValues.
A Rescorer that re-sorts according to a provided Sort.
A query that treats multiple terms as synonyms.
Executor wrapper responsible for the execution of concurrent tasks. Used to parallelize search across segments as well as query rewrite in some cases. Exposes a single .invokeAll method that takes a collection of Callables and executes them concurrently. Once all but one task have been submitted to the executor, it tries to run as many tasks as possible on the calling thread, then waits for all tasks that have been executed in parallel on the executor to be completed and then returns a list with the obtained results.
Specialization for a disjunction over many terms that, by default, behaves like a [ ] over a BooleanQuery containing only clauses.
A Query that matches documents containing a term. This may be combined with other terms with a BooleanQuery.
A Query that matches documents within an range of terms.
Expert: A Scorer for documents matching a Term.
Contains statistics for a specific term
A KnnCollectorManager that collects results with a timeout.
A base class for all collectors that return a TopDocs output. This collector allows easy extension by providing a single constructor which accepts a PriorityQueue as well as protected members for that priority queue and a counter of the number of total hits.
Extending classes can override any of the methods to provide their own implementation, as well as avoid the use of the priority queue entirely by passing null to .TopDocsCollector. In that case however, you might want to consider overriding all methods, in order to avoid a NullPointerException.
A Collector that sorts by SortField using FieldComparators.
Create a TopFieldCollectorManager which uses a shared hit counter to maintain number of hits and a shared MaxScoreAccumulator to propagate the minimum score across segments if the primary sort is by relevancy.
Represents hits returned by IndexSearcher.search.
TopKnnCollector is a specific KnnCollector. A minHeap is used to keep track of the currently collected vectors allowing for efficient updates as better vectors are collected.
A Collector implementation that collects the top-scoring hits, returning them as a [ ]. This is used by IndexSearcher to implement TopDocs-based search. Hits are sorted by score descending and then (when the scores are tied) docID ascending. When you create an instance of this collector you should know in advance whether documents are going to be collected in doc Id order or not.
Create a TopScoreDocCollectorManager which uses a shared hit counter to maintain number of hits and a shared MaxScoreAccumulator to propagate the minimum score across segments
Base rewrite method for collecting only the top terms via a priority queue.
Just counts the total number of hits. This is the collector behind IndexSearcher.count. When the Weight implements Weight.count, this collector will skip collecting segments.
Collector manager based on TotalHitCountCollector that allows users to parallelize counting the number of hits, expected to be used mostly wrapped in MultiCollectorManager. For cases when this is the only collector manager used, IndexSearcher.count should be called instead of IndexSearcher.search as the former is faster whenever the count can be returned directly from the index statistics.
Description of the total number of hits of a query. The total hit count can't generally be computed accurately without visiting all matches, which is costly for queries that match lots of documents. Given that it is often enough to have a lower bounds of the number of hits, such as "there are more than 1000 hits", Lucene has options to stop counting as soon as a threshold has been reached in order to improve query times.
Returned by Scorer.twoPhaseIterator to expose an approximation of a [ ]. When the .approximation's DocIdSetIterator.nextDoc or DocIdSetIterator.advance return, .matches needs to be checked in order to know whether the returned doc ID actually matches.
A QueryCachingPolicy that tracks usage statistics of recently-used filters in order to decide on which filters are worth caching.
Computes the similarity score between a given query vector and different document vectors. This is used for exact searching and scoring
Perform a similarity-based graph search.
Expert: Calculate query weights and build query scorers.
Implements the wildcard search query. Supported wildcards are *, which matches any character sequence (including the empty one), and ``, which matches any single character. '\' is the escape character.