Filtering is a mechanism of narrowing the search space, allowing only a subset of the documents to be considered as possible hits. They can be used to implement search-within-search features to successively search within a previous set of results or to constrain the document search space. A security filter allows users to only see search results of documents they “own,” even if their query technically matches other documents that are off limits; we provide an example of a security filter in section 5.6.7.
You can filter any Lucene search using the overloaded search methods that accept a Filter instance. There are numerous built-in filter implementations:
Before you get concerned about mentions of caching results, rest assured that it’s done with a tiny data structure (a DocIdBitSet) where each bit position represents a document. Consider also the alternative to using a filter: aggregating required clauses in a BooleanQuery. In this section, we’ll discuss each of the built-in filters as well as the BooleanQuery alternative.
TermRangeFilter filters on a range of terms in a specific field, just like TermRangeQuery minus the scoring. If the field is numeric, you should use NumericRangeFilter(described next) instead. TermRangeFilter applies to textual fields.
Let’s look at title filtering as an example, shown in listing 5.12. We use the MatchAllDocsQuery as our query, and then apply a title filter to it.
- Listing 5.12 Using TermRangeFilter to filter by title
OPEN-ENDED RANGE FILTERING
TermRangeFilter also supports open-ended ranges. To filter on ranges with one end of the range specified and the other end open, just pass null for whichever end should be open:
NumericRangeFilter filters by numeric value. This is just like NumericRangeQuery, minus the constant scoring:
FieldCacheRangeFilter is another option for range filtering. It achieves exactly the same filtering as both TermRangeFilter and NumericRangeFilter, but does so by using Lucene’s field cache. This may result in faster performance in certain situations, since all values are preloaded into memory. But the usual caveats with field cache apply, as described in section 5.1.
This filter exposes a different API to achieve range filtering. Here’s how to do the same filtering on title2 that we did with TermRangeFilter:
- Filtering by specific terms
Sometimes you’d simply like to select specific terms to include in your filter. For example, perhaps your documents have Country as a field, and your search interface presents a checkbox allowing the user to pick and choose which countries to include in the search. There are two ways to achieve this.
The first approach is FieldCacheTermsFilter, which uses field cache under the hood. (Be sure to read section 5.1 for the trade-offs of the field cache.) Simply instantiate it with the field (String) and an array of String:
The second approach for filtering by terms is TermsFilter, which is included in Lucene’s contrib modules and is described in more detail in section 8.6.4. TermsFilterdoesn’t do any internal caching, and it allows filtering on fields that have more than one term; otherwise, TermsFilter and FieldCacheTermsFilter are functionally identical. It’s best to test both approaches for your application to see if there are any significant performance differences.