Spans near one another:
A PhraseQuery (see section 3.4.6) matches documents that have terms near one another, with a slop factor to allow for intermediate or reversed terms.SpanNearQuery operates similarly to PhraseQuery, with some important differences. SpanNearQuery matches spans that are within a certain number of positions from one another, with a separate flag indicating whether the spans must be in the order specified or can be reversed. The resulting matching spans span from the start position of the first span sequentially to the ending position of the last span. An example of a SpanNearQuery given three SpanTermQuery objects is shown in figure 5.3.
Using SpanTermQuery objects as the SpanQuerys in a SpanNearQuery is much like using a PhraseQuery. The SpanNearQuery slop factor is a bit less confusing than the PhraseQuery slop factor because it doesn’t require at least two additional positions to account for a reversed span. To reverse a SpanNearQuery, set the inOrderflag (third argument to the constructor) to false. Listing 5.10 demonstrates a few variations of SpanNearQuery and shows it in relation to PhraseQuery.
- Listing 5.10 Finding matches near one another using SpanNearQuery
We’ve only shown SpanNearQuery with nested SpanTermQuerys, but SpanNearQuery allows for any SpanQuery type. A more sophisticated SpanNearQuery example is demonstrated later in listing 5.11 in conjunction with SpanOrQuery. Next we visit SpanNotQuery.
Excluding span overlap from matches:
The SpanNotQuery excludes matches where one SpanQuery overlaps another. The following code demonstrates:
The first argument to the
SpanNotQuery constructor is a span to include, and the second argument is a span to exclude. Below is the output:
The SpanNearQuery matched both documents because both have quick and fox within one position of each other. The first SpanNotQuery, quick_fox_dog, continues to match both documents because there’s no overlap with the quick_fox span and dog. The second SpanNotQuery, no_quick_red_fox, excludes the second document because red overlaps with the quick_fox span. Notice that the resulting span matches are the original included span. The excluded span is only used to determine if there’s an overlap and doesn’t factor into the resulting span matches.
SpanOrQuery:
Finally let’s talk about SpanOrQuery, which aggregates an array of SpanQuerys. Our example query, in English, is all documents that have “quick fox” near “lazy dog” or that have “quick fox” near “sleepy cat.” The first clause of this query is shown in figure 5.4. This single clause is SpanNearQuery nesting two SpanNearQuery, and each consists of two SpanTermQuerys.
Our test case becomes a bit lengthier due to all the sub-SpanQuerys being built on:
- Listing 5.11 Taking the union of two span queries using SpanOrQuery
Here’s the output, followed by our analysis of it:
Two SpanNearQuerys are created to match “quick fox” near “lazy dog” (qf_near_ld) and “quick fox” near “sleepy cat” (qf_near_sc) using nested SpanNearQuerys made up of SpanTermQuerys at the lowest level. Finally, these two SpanNearQuery instances are combined within a SpanOrQuery, which aggregates all matching spans.
SpanQuery and QueryParser:
QueryParser doesn’t currently support any of the SpanQuery types, but the surround QueryParser in Lucene’s contrib modules does. We cover the surround parser insection 9.6.
Recall from section 3.4.6 that PhraseQuery is impartial to term order when enough slop is specified. Interestingly, you can easily extend QueryParser to use aSpanNearQuery with SpanTermQuery clauses instead, and force phrase queries to only match fields with the terms in the same order as specified. We demonstrate this technique in section 6.3.5.
Supplement:
* Ch5. Advanced search techniques - Span queries (1)
* Ch5. Advanced search techniques - Span queries (2)
A PhraseQuery (see section 3.4.6) matches documents that have terms near one another, with a slop factor to allow for intermediate or reversed terms.SpanNearQuery operates similarly to PhraseQuery, with some important differences. SpanNearQuery matches spans that are within a certain number of positions from one another, with a separate flag indicating whether the spans must be in the order specified or can be reversed. The resulting matching spans span from the start position of the first span sequentially to the ending position of the last span. An example of a SpanNearQuery given three SpanTermQuery objects is shown in figure 5.3.
Using SpanTermQuery objects as the SpanQuerys in a SpanNearQuery is much like using a PhraseQuery. The SpanNearQuery slop factor is a bit less confusing than the PhraseQuery slop factor because it doesn’t require at least two additional positions to account for a reversed span. To reverse a SpanNearQuery, set the inOrderflag (third argument to the constructor) to false. Listing 5.10 demonstrates a few variations of SpanNearQuery and shows it in relation to PhraseQuery.
- Listing 5.10 Finding matches near one another using SpanNearQuery
- public void testSpanNearQuery() throws Exception {
- // (1)
- SpanQuery[] quick_brown_dog = new SpanQuery[] { quick, brown, dog };
- SpanNearQuery snq = new SpanNearQuery(quick_brown_dog, 0, true);
- assertNoMatches(snq);
- dumpSpans(snq);
- // (2)
- snq = new SpanNearQuery(quick_brown_dog, 4, true);
- assertNoMatches(snq);
- dumpSpans(snq);
- // (3)
- snq = new SpanNearQuery(quick_brown_dog, 5, true);
- assertOnlyBrownFox(snq);
- dumpSpans(snq);
- // (4)
- // interesting - even a sloppy phrase query would require
- // more slop to match
- snq = new SpanNearQuery(new SpanQuery[] { lazy, fox }, 3, false);
- assertOnlyBrownFox(snq);
- dumpSpans(snq);
- // (5)
- PhraseQuery pq = new PhraseQuery();
- pq.add(new Term("f", "lazy"));
- pq.add(new Term("f", "fox"));
- pq.setSlop(4);
- assertNoMatches(pq);
- // (6)
- pq.setSlop(5);
- assertOnlyBrownFox(pq);
- }
We’ve only shown SpanNearQuery with nested SpanTermQuerys, but SpanNearQuery allows for any SpanQuery type. A more sophisticated SpanNearQuery example is demonstrated later in listing 5.11 in conjunction with SpanOrQuery. Next we visit SpanNotQuery.
Excluding span overlap from matches:
The SpanNotQuery excludes matches where one SpanQuery overlaps another. The following code demonstrates:
- public void testSpanNotQuery() throws Exception {
- SpanNearQuery quick_fox = new SpanNearQuery(new SpanQuery[]{quick, fox}, 1, true);
- assertBothFoxes(quick_fox);
- dumpSpans(quick_fox);
- SpanNotQuery quick_fox_dog = new SpanNotQuery(quick_fox, dog);
- assertBothFoxes(quick_fox_dog);
- dumpSpans(quick_fox_dog);
- SpanNotQuery no_quick_red_fox = new SpanNotQuery(quick_fox, red);
- assertOnlyBrownFox(no_quick_red_fox);
- dumpSpans(no_quick_red_fox);
- }
The SpanNearQuery matched both documents because both have quick and fox within one position of each other. The first SpanNotQuery, quick_fox_dog, continues to match both documents because there’s no overlap with the quick_fox span and dog. The second SpanNotQuery, no_quick_red_fox, excludes the second document because red overlaps with the quick_fox span. Notice that the resulting span matches are the original included span. The excluded span is only used to determine if there’s an overlap and doesn’t factor into the resulting span matches.
SpanOrQuery:
Finally let’s talk about SpanOrQuery, which aggregates an array of SpanQuerys. Our example query, in English, is all documents that have “quick fox” near “lazy dog” or that have “quick fox” near “sleepy cat.” The first clause of this query is shown in figure 5.4. This single clause is SpanNearQuery nesting two SpanNearQuery, and each consists of two SpanTermQuerys.
Our test case becomes a bit lengthier due to all the sub-SpanQuerys being built on:
- Listing 5.11 Taking the union of two span queries using SpanOrQuery
- public void testSpanOrQuery() throws Exception {
- SpanNearQuery quick_fox = new SpanNearQuery(new SpanQuery[] { quick,fox }, 1, true);
- SpanNearQuery lazy_dog = new SpanNearQuery(new SpanQuery[] { lazy, dog }, 0, true);
- SpanNearQuery sleepy_cat = new SpanNearQuery(new SpanQuery[] { sleepy, cat }, 0, true);
- SpanNearQuery qf_near_ld = new SpanNearQuery(new SpanQuery[] {quick_fox, lazy_dog }, 3, true);
- assertOnlyBrownFox(qf_near_ld);
- dumpSpans(qf_near_ld);
- SpanNearQuery qf_near_sc = new SpanNearQuery(new SpanQuery[] {quick_fox, sleepy_cat }, 3, true);
- dumpSpans(qf_near_sc);
- SpanOrQuery or = new SpanOrQuery(new SpanQuery[] {qf_near_ld, qf_near_sc });
- assertBothFoxes(or);
- dumpSpans(or);
- }
Two SpanNearQuerys are created to match “quick fox” near “lazy dog” (qf_near_ld) and “quick fox” near “sleepy cat” (qf_near_sc) using nested SpanNearQuerys made up of SpanTermQuerys at the lowest level. Finally, these two SpanNearQuery instances are combined within a SpanOrQuery, which aggregates all matching spans.
SpanQuery and QueryParser:
QueryParser doesn’t currently support any of the SpanQuery types, but the surround QueryParser in Lucene’s contrib modules does. We cover the surround parser insection 9.6.
Recall from section 3.4.6 that PhraseQuery is impartial to term order when enough slop is specified. Interestingly, you can easily extend QueryParser to use aSpanNearQuery with SpanTermQuery clauses instead, and force phrase queries to only match fields with the terms in the same order as specified. We demonstrate this technique in section 6.3.5.
Supplement:
* Ch5. Advanced search techniques - Span queries (1)
* Ch5. Advanced search techniques - Span queries (2)
This message was edited 20 times. Last update was at 14/05/2013 10:26:19
沒有留言:
張貼留言