Lucene’s relevance scoring formula, which we discussed in chapter 3, does a great job of assigning relevance to each document based on how well it matches the query. But what if you’d like to modify or override how this scoring is done? In section 5.2 you saw how you can change the default relevance sorting to sort instead by one or more fields, but what if you need even more flexibility? This is where function queries come in.
Function queries give you the freedom to programmatically assign scores to matching documents using your own logic. All classes are from theorg.apache.lucene.search.function package. In this section we first introduce the main classes used by function queries, and then see the real-world example of using function queries to boost recently modified documents.
Function query classes:
The base class for all function queries is ValueSourceQuery. This is a query that matches all documents but sets the score of each document according to a ValueSourceprovided during construction. The function package provides FieldCacheSource, and its subclasses, to derive values from the field cache. You can also create your ownValueSource—for example, to derive scores from an external database. But probably the simplest approach is to use FieldScoreQuery, which subclassesValueSourceQuery and derives each document’s score statically from a specific indexed field. The field should be a number, indexed without norms and with a single token per document. Typically you’d use Field.Index.NOT_ANALYZED_NO_NORMS. Let’s look at a simple example. First, include the field “score” in your documents:
Our example is somewhat contrived; you could simply sort by the score field, descending, to achieve the same results. But function queries get more interesting when you combine them using the second type of function query, CustomScoreQuery. This query class lets you combine a normal Lucene query with one or more other function queries. We can now use the FieldScoreQuery we created earlier and a CustomScoreQuery to compute our own score:
Boosting recently modified documents using function queries:
A real-world use of CustomScoreQuery is to perform document boosting. You can boost according to any custom criteria, but for our example, shown in listing 5.15, we boost recently modified documents using a new custom query class, RecencyBoostingQuery. In applications where documents have a clear timestamp, such as searching a newsfeed or press releases, boosting by recency can be useful. The class requires you to specify the name of a numeric field that contains the timestamp of each document that you’d like to use for boosting.
- Listing 5.15 Using recency to boost search results
Once the index is set up, using RecencyBoostingQuery is straightforward, as shown in listing 5.16.
Listing 5.16 Testing recency boosting
If instead you run the search with q2, which boosts each result by recency, you’ll see this:
You can see that in the unboosted query, the top two results were tied based on relevance. But after factoring in recency boosting, the scores were different and the sort order changed.
This wraps up our coverage of function queries. Although we focused on one compelling example, boosting relevance scoring according to recency, function queries open up a whole universe of possibilities. You’re completely free to implement what-ever scoring you’d like.