public final class CommonGramsFilter
extends org.apache.lucene.analysis.TokenFilter
PositionIncrementAttribute.setPositionIncrement(int)
. Bigrams have a type
of GRAM_TYPE
Example:
Modifier and Type | Field and Description |
---|---|
private java.lang.StringBuilder |
buffer |
private org.apache.lucene.analysis.CharArraySet |
commonWords |
(package private) static java.lang.String |
GRAM_TYPE |
private int |
lastStartOffset |
private boolean |
lastWasCommon |
private org.apache.lucene.analysis.tokenattributes.OffsetAttribute |
offsetAttribute |
private org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute |
posIncAttribute |
private org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute |
posLenAttribute |
private org.apache.lucene.util.AttributeSource.State |
savedState |
private static char |
SEPARATOR |
private org.apache.lucene.analysis.tokenattributes.CharTermAttribute |
termAttribute |
private org.apache.lucene.analysis.tokenattributes.TypeAttribute |
typeAttribute |
Constructor and Description |
---|
CommonGramsFilter(org.apache.lucene.analysis.TokenStream input,
java.util.Set<?> commonWords)
Deprecated.
Use
CommonGramsFilter(Version, TokenStream, Set) instead |
CommonGramsFilter(org.apache.lucene.analysis.TokenStream input,
java.util.Set<?> commonWords,
boolean ignoreCase)
Deprecated.
Use
CommonGramsFilter(Version, TokenStream, Set) instead |
CommonGramsFilter(org.apache.lucene.analysis.TokenStream input,
java.lang.String[] commonWords)
Deprecated.
Use
CommonGramsFilter(Version, TokenStream, Set) instead. |
CommonGramsFilter(org.apache.lucene.analysis.TokenStream input,
java.lang.String[] commonWords,
boolean ignoreCase)
Deprecated.
|
CommonGramsFilter(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
java.util.Set<?> commonWords)
Construct a token stream filtering the given input using a Set of common
words to create bigrams.
|
CommonGramsFilter(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input,
java.util.Set<?> commonWords,
boolean ignoreCase)
Deprecated.
Use
CommonGramsFilter(Version, TokenStream, Set) instead |
Modifier and Type | Method and Description |
---|---|
private void |
gramToken()
Constructs a compound token.
|
boolean |
incrementToken()
Inserts bigrams for common words into a token stream.
|
private boolean |
isCommon()
Determines if the current token is a common term
|
static org.apache.lucene.analysis.CharArraySet |
makeCommonSet(java.lang.String[] commonWords)
Deprecated.
create a CharArraySet with CharArraySet instead
|
static org.apache.lucene.analysis.CharArraySet |
makeCommonSet(java.lang.String[] commonWords,
boolean ignoreCase)
Deprecated.
create a CharArraySet with CharArraySet instead
|
void |
reset() |
private void |
saveTermBuffer()
Saves this information to form the left part of a gram
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
static final java.lang.String GRAM_TYPE
private static final char SEPARATOR
private final org.apache.lucene.analysis.CharArraySet commonWords
private final java.lang.StringBuilder buffer
private final org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAttribute
private final org.apache.lucene.analysis.tokenattributes.OffsetAttribute offsetAttribute
private final org.apache.lucene.analysis.tokenattributes.TypeAttribute typeAttribute
private final org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute posIncAttribute
private final org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute posLenAttribute
private int lastStartOffset
private boolean lastWasCommon
private org.apache.lucene.util.AttributeSource.State savedState
@Deprecated public CommonGramsFilter(org.apache.lucene.analysis.TokenStream input, java.util.Set<?> commonWords)
CommonGramsFilter(Version, TokenStream, Set)
instead@Deprecated public CommonGramsFilter(org.apache.lucene.analysis.TokenStream input, java.util.Set<?> commonWords, boolean ignoreCase)
CommonGramsFilter(Version, TokenStream, Set)
insteadpublic CommonGramsFilter(org.apache.lucene.util.Version matchVersion, org.apache.lucene.analysis.TokenStream input, java.util.Set<?> commonWords)
input
- TokenStream input in filter chaincommonWords
- The set of common words.@Deprecated public CommonGramsFilter(org.apache.lucene.util.Version matchVersion, org.apache.lucene.analysis.TokenStream input, java.util.Set<?> commonWords, boolean ignoreCase)
CommonGramsFilter(Version, TokenStream, Set)
insteadcommonWords
is an instance of
CharArraySet
(true if makeCommonSet()
was used to
construct the set) it will be directly used and ignoreCase
will be ignored since CharArraySet
directly controls case
sensitivity.
If commonWords
is not an instance of CharArraySet
, a
new CharArraySet will be constructed and ignoreCase
will be
used to specify the case sensitivity of that set.input
- TokenStream input in filter chain.commonWords
- The set of common words.ignoreCase
- -Ignore case when constructing bigrams for common words.@Deprecated public CommonGramsFilter(org.apache.lucene.analysis.TokenStream input, java.lang.String[] commonWords)
CommonGramsFilter(Version, TokenStream, Set)
instead.input
- Tokenstream in filter chaincommonWords
- words to be used in constructing bigrams@Deprecated public CommonGramsFilter(org.apache.lucene.analysis.TokenStream input, java.lang.String[] commonWords, boolean ignoreCase)
CommonGramsFilter(Version, TokenStream, Set, boolean)
instead.input
- Tokenstream in filter chaincommonWords
- words to be used in constructing bigramsignoreCase
- -Ignore case when constructing bigrams for common words.@Deprecated public static org.apache.lucene.analysis.CharArraySet makeCommonSet(java.lang.String[] commonWords)
commonWords
- Array of common words which will be converted into the CharArraySetpassing false to ignoreCase
@Deprecated public static org.apache.lucene.analysis.CharArraySet makeCommonSet(java.lang.String[] commonWords, boolean ignoreCase)
commonWords
- Array of common words which will be converted into the CharArraySetignoreCase
- If true, all words are lower cased first.public boolean incrementToken() throws java.io.IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void reset() throws java.io.IOException
reset
in class org.apache.lucene.analysis.TokenFilter
java.io.IOException
private boolean isCommon()
true
if the current token is a common term, false
otherwiseprivate void saveTermBuffer()
private void gramToken()