public class PhraseHelper
extends java.lang.Object
FieldOffsetStrategy
with strict position highlighting (e.g. highlight phrases correctly).
This is a stateful class holding information about the query, but it can (and is) re-used across highlighting
documents. Despite this state; it's immutable after construction. The approach taken in this class is very similar
to the standard Highlighter's WeightedSpanTermExtractor
which is in fact re-used here. However, we ought to
completely rewrite it to use the SpanCollector interface to collect offsets directly. We'll get better
phrase accuracy.Modifier and Type | Class and Description |
---|---|
private static class |
PhraseHelper.CachedSpans
A Spans based on a list of cached spans for one doc.
|
private class |
PhraseHelper.FieldFilteringTermSet
Simple TreeSet that filters out Terms not matching the provided predicate on
add() . |
(package private) static class |
PhraseHelper.MultiSpans
A single
Spans view over multiple spans. |
(package private) static class |
PhraseHelper.SingleFieldFilterLeafReader
Needed to support the ability to highlight a query irrespective of the field a query refers to
(aka requireFieldMatch=false).
|
Modifier and Type | Field and Description |
---|---|
private java.util.function.Predicate<java.lang.String> |
fieldMatcher |
private java.lang.String |
fieldName |
static PhraseHelper |
NONE |
private java.util.Set<Term> |
positionInsensitiveTerms |
private java.util.Set<SpanQuery> |
spanQueries |
private static java.util.Comparator<? super Spans> |
SPANS_COMPARATOR |
private boolean |
willRewrite |
Constructor and Description |
---|
PhraseHelper(Query query,
java.lang.String field,
java.util.function.Predicate<java.lang.String> fieldMatcher,
java.util.function.Function<SpanQuery,java.lang.Boolean> rewriteQueryPred,
java.util.function.Function<Query,java.util.Collection<Query>> preExtractRewriteFunction,
boolean ignoreQueriesNeedingRewrite)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
(package private) java.util.List<BytesRef> |
expandTermsIfRewrite(BytesRef[] terms,
java.util.Map<BytesRef,Spans> strictPhrasesTermToSpans)
Returns terms as a List, but expanded to any terms in phraseHelper' keySet if present.
|
(package private) PostingsEnum |
filterPostings(BytesRef term,
PostingsEnum postingsEnum,
Spans spans)
Returns a filtered postings where the position must be in the given Spans.
|
(package private) java.util.Set<SpanQuery> |
getSpanQueries() |
(package private) java.util.Map<BytesRef,Spans> |
getTermToSpans(LeafReader leafReader,
int doc)
Collect a list of pre-positioned
Spans for each term, given a reader that has just one document. |
private void |
getTermToSpans(SpanQuery spanQuery,
LeafReaderContext readerContext,
int doc,
java.util.Map<BytesRef,Spans> result) |
(package private) boolean |
hasPositionSensitivity()
If there is no position sensitivity then use of the instance of this class can be ignored.
|
(package private) boolean |
willRewrite()
Rewrite is needed for handling a
SpanMultiTermQueryWrapper (MTQ / wildcards) or some
custom things. |
public static final PhraseHelper NONE
private static final java.util.Comparator<? super Spans> SPANS_COMPARATOR
private final java.lang.String fieldName
private final java.util.Set<Term> positionInsensitiveTerms
private final java.util.Set<SpanQuery> spanQueries
private final boolean willRewrite
private final java.util.function.Predicate<java.lang.String> fieldMatcher
public PhraseHelper(Query query, java.lang.String field, java.util.function.Predicate<java.lang.String> fieldMatcher, java.util.function.Function<SpanQuery,java.lang.Boolean> rewriteQueryPred, java.util.function.Function<Query,java.util.Collection<Query>> preExtractRewriteFunction, boolean ignoreQueriesNeedingRewrite)
rewriteQueryPred
is an extension hook to override the default choice of
WeightedSpanTermExtractor.mustRewriteQuery(SpanQuery)
. By default unknown query types are rewritten,
so use this to return Boolean.FALSE
if you know the query doesn't need to be rewritten.
Similarly, preExtractRewriteFunction
is also an extension hook for extract to allow different queries
to be set before the WeightedSpanTermExtractor
's extraction is invoked.
ignoreQueriesNeedingRewrite
effectively ignores any query clause that needs to be "rewritten", which is
usually limited to just a SpanMultiTermQueryWrapper
but could be other custom ones.
fieldMatcher
The field name predicate to use for extracting the query part that must be highlighted.java.util.Set<SpanQuery> getSpanQueries()
boolean hasPositionSensitivity()
boolean willRewrite()
SpanMultiTermQueryWrapper
(MTQ / wildcards) or some
custom things. When true, the resulting term list will probably be different than what it was known
to be initially.java.util.Map<BytesRef,Spans> getTermToSpans(LeafReader leafReader, int doc) throws java.io.IOException
Spans
for each term, given a reader that has just one document.
It returns no mapping for query terms that occurs in a position insensitive way which therefore don't
need to be filtered.java.io.IOException
private void getTermToSpans(SpanQuery spanQuery, LeafReaderContext readerContext, int doc, java.util.Map<BytesRef,Spans> result) throws java.io.IOException
java.io.IOException
java.util.List<BytesRef> expandTermsIfRewrite(BytesRef[] terms, java.util.Map<BytesRef,Spans> strictPhrasesTermToSpans)
PostingsEnum filterPostings(BytesRef term, PostingsEnum postingsEnum, Spans spans) throws java.io.IOException
postingsEnum
should be positioned at the
document (the same one as the spans) but it hasn't iterated the positions yet.
The Spans should be the result of a simple
lookup from getTermToSpans(LeafReader, int)
, and so it could be null which could mean
either it's completely filtered or that there should be no filtering; this class knows what to do.
Due to limitations in filtering, the PostingsEnum.freq()
is un-changed even if some positions
get filtered. So when PostingsEnum.nextPosition()
is called or startOffset
or endOffset
beyond the "real" positions, these methods returns Integer.MAX_VALUE
.
This will return null if it's completely filtered out (i.e. effectively has no postings).
java.io.IOException