final class SloppyPhraseScorer extends Scorer
Scorer.ChildScorer
Modifier and Type | Field and Description |
---|---|
private boolean |
checkedRpts |
private DocIdSetIterator |
conjunction |
private Similarity.SimScorer |
docScorer |
private int |
end |
private boolean |
hasMultiTermRpts |
private boolean |
hasRpts |
private float |
matchCost |
(package private) boolean |
needsScores |
private int |
numMatches |
private int |
numPostings |
private PhrasePositions[] |
phrasePositions |
private PhraseQueue |
pq |
private PhrasePositions[][] |
rptGroups |
private PhrasePositions[] |
rptStack |
private int |
slop |
private float |
sloppyFreq |
Constructor and Description |
---|
SloppyPhraseScorer(Weight weight,
PhraseQuery.PostingsAndFreq[] postings,
int slop,
Similarity.SimScorer docScorer,
boolean needsScores,
float matchCost) |
Modifier and Type | Method and Description |
---|---|
private boolean |
advancePP(PhrasePositions pp)
advance a PhrasePosition and update 'end', return false if exhausted
|
private boolean |
advanceRepeatGroups()
At initialization (each doc), each repetition group is sorted by (query) offset.
|
private boolean |
advanceRpts(PhrasePositions pp)
pp was just advanced.
|
private int |
collide(PhrasePositions pp)
index of a pp2 colliding with pp, or -1 if none
|
int |
docID()
Returns the doc ID that is currently being scored.
|
private void |
fillQueue()
Fill the queue (all pps are already placed
|
int |
freq()
Returns the freq of this Scorer on the current document
|
private java.util.ArrayList<java.util.ArrayList<PhrasePositions>> |
gatherRptGroups(java.util.LinkedHashMap<Term,java.lang.Integer> rptTerms)
Detect repetition groups.
|
private boolean |
initComplex()
with repeats: not so simple.
|
private boolean |
initFirstTime()
initialize with checking for repeats.
|
private boolean |
initPhrasePositions()
Initialize PhrasePositions in place.
|
private void |
initSimple()
no repeats: simplest case, and most common.
|
DocIdSetIterator |
iterator()
Return a
DocIdSetIterator over matching documents. |
private PhrasePositions |
lesser(PhrasePositions pp,
PhrasePositions pp2)
compare two pps, but only by position and offset
|
private float |
phraseFreq()
Score a candidate doc for all slop-valid position-combinations (matches)
encountered while traversing/hopping the PhrasePositions.
|
private void |
placeFirstPositions()
move all PPs to their first position
|
private java.util.ArrayList<FixedBitSet> |
ppTermsBitSets(PhrasePositions[] rpp,
java.util.HashMap<Term,java.lang.Integer> tord)
bit-sets - for each repeating pp, for each of its repeating terms, the term ordinal values is set
|
private PhrasePositions[] |
repeatingPPs(java.util.HashMap<Term,java.lang.Integer> rptTerms)
find repeating pps, and for each, if has multi-terms, update this.hasMultiTermRpts
|
private java.util.LinkedHashMap<Term,java.lang.Integer> |
repeatingTerms()
find repeating terms and assign them ordinal values
|
float |
score()
Returns the score of the current document matching the query.
|
(package private) float |
sloppyFreq() |
private void |
sortRptGroups(java.util.ArrayList<java.util.ArrayList<PhrasePositions>> rgs)
sort each repetition group by (query) offset.
|
private java.util.HashMap<Term,java.lang.Integer> |
termGroups(java.util.LinkedHashMap<Term,java.lang.Integer> tord,
java.util.ArrayList<FixedBitSet> bb)
map each term to the single group that contains it
|
java.lang.String |
toString() |
private int |
tpPos(PhrasePositions pp)
Actual position in doc of a PhrasePosition, relies on that position = tpPos - offset)
|
TwoPhaseIterator |
twoPhaseIterator()
Optional method: Return a
TwoPhaseIterator view of this
Scorer . |
private void |
unionTermGroups(java.util.ArrayList<FixedBitSet> bb)
union (term group) bit-sets until they are disjoint (O(n^^2)), and each group have different terms
|
getChildren, getWeight
private final DocIdSetIterator conjunction
private final PhrasePositions[] phrasePositions
private float sloppyFreq
private final Similarity.SimScorer docScorer
private final int slop
private final int numPostings
private final PhraseQueue pq
private int end
private boolean hasRpts
private boolean checkedRpts
private boolean hasMultiTermRpts
private PhrasePositions[][] rptGroups
private PhrasePositions[] rptStack
private int numMatches
final boolean needsScores
private final float matchCost
SloppyPhraseScorer(Weight weight, PhraseQuery.PostingsAndFreq[] postings, int slop, Similarity.SimScorer docScorer, boolean needsScores, float matchCost)
private float phraseFreq() throws java.io.IOException
java.io.IOException
private boolean advancePP(PhrasePositions pp) throws java.io.IOException
java.io.IOException
private boolean advanceRpts(PhrasePositions pp) throws java.io.IOException
java.io.IOException
private PhrasePositions lesser(PhrasePositions pp, PhrasePositions pp2)
private int collide(PhrasePositions pp)
private boolean initPhrasePositions() throws java.io.IOException
java.io.IOException
private void initSimple() throws java.io.IOException
java.io.IOException
private boolean initComplex() throws java.io.IOException
java.io.IOException
private void placeFirstPositions() throws java.io.IOException
java.io.IOException
private void fillQueue()
private boolean advanceRepeatGroups() throws java.io.IOException
Case 1: no multi-term repeats
It is sufficient to advance each pp in the group by one less than its group index.
So lesser pp is not advanced, 2nd one advance once, 3rd one advanced twice, etc.
Case 2: multi-term repeats
java.io.IOException
private boolean initFirstTime() throws java.io.IOException
If there are repetitions, check if multi-term postings (MTP) are involved.
Without MTP, once PPs are placed in the first candidate doc, repeats (and groups) are visible.
With MTP, a more complex check is needed, up-front, as there may be "hidden collisions".
For example P1 has {A,B}, P1 has {B,C}, and the first doc is: "A C B". At start, P1 would point
to "A", p2 to "C", and it will not be identified that P1 and P2 are repetitions of each other.
The more complex initialization has two parts:
(1) identification of repetition groups.
(2) advancing repeat groups at the start of the doc.
For (1), a possible solution is to just create a single repetition group,
made of all repeating pps. But this would slow down the check for collisions,
as all pps would need to be checked. Instead, we compute "connected regions"
on the bipartite graph of postings and terms.
java.io.IOException
private void sortRptGroups(java.util.ArrayList<java.util.ArrayList<PhrasePositions>> rgs)
private java.util.ArrayList<java.util.ArrayList<PhrasePositions>> gatherRptGroups(java.util.LinkedHashMap<Term,java.lang.Integer> rptTerms) throws java.io.IOException
java.io.IOException
private final int tpPos(PhrasePositions pp)
private java.util.LinkedHashMap<Term,java.lang.Integer> repeatingTerms()
private PhrasePositions[] repeatingPPs(java.util.HashMap<Term,java.lang.Integer> rptTerms)
private java.util.ArrayList<FixedBitSet> ppTermsBitSets(PhrasePositions[] rpp, java.util.HashMap<Term,java.lang.Integer> tord)
private void unionTermGroups(java.util.ArrayList<FixedBitSet> bb)
private java.util.HashMap<Term,java.lang.Integer> termGroups(java.util.LinkedHashMap<Term,java.lang.Integer> tord, java.util.ArrayList<FixedBitSet> bb) throws java.io.IOException
java.io.IOException
public int freq()
Scorer
float sloppyFreq()
public int docID()
Scorer
-1
if the Scorer.iterator()
is not positioned
or DocIdSetIterator.NO_MORE_DOCS
if it has been entirely consumed.docID
in class Scorer
DocIdSetIterator.docID()
public float score() throws java.io.IOException
Scorer
DocIdSetIterator.nextDoc()
or
DocIdSetIterator.advance(int)
is called on the Scorer.iterator()
the first time, or when called from within LeafCollector.collect(int)
.public java.lang.String toString()
toString
in class java.lang.Object
public TwoPhaseIterator twoPhaseIterator()
Scorer
TwoPhaseIterator
view of this
Scorer
. A return value of null
indicates that
two-phase iteration is not supported.
Note that the returned TwoPhaseIterator
's
approximation
must
advance synchronously with the Scorer.iterator()
: advancing the
approximation must advance the iterator and vice-versa.
Implementing this method is typically useful on Scorer
s
that have a high per-document overhead in order to confirm matches.
The default implementation returns null
.twoPhaseIterator
in class Scorer
public DocIdSetIterator iterator()
Scorer
DocIdSetIterator
over matching documents.
The returned iterator will either be positioned on -1
if no
documents have been scored yet, DocIdSetIterator.NO_MORE_DOCS
if all documents have been scored already, or the last document id that
has been scored otherwise.
The returned iterator is a view: calling this method several times will
return iterators that have the same state.