public class HyphenationCompoundWordTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware
HyphenationCompoundWordTokenFilter
.
This factory accepts the following parameters:
hyphenator
(mandatory): path to the FOP xml hyphenation pattern.
See http://offo.sourceforge.net/hyphenation/.
encoding
(optional): encoding of the xml hyphenation file. defaults to UTF-8.
dictionary
(optional): dictionary of words. defaults to no dictionary.
minWordSize
(optional): minimal word length that gets decomposed. defaults to 5.
minSubwordSize
(optional): minimum length of subwords. defaults to 2.
maxSubwordSize
(optional): maximum length of subwords. defaults to 15.
onlyLongestMatch
(optional): if true, adds only the longest matching subword
to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>
HyphenationCompoundWordTokenFilter
Modifier and Type | Field and Description |
---|---|
private java.lang.String |
dictFile |
private org.apache.lucene.analysis.CharArraySet |
dictionary |
private java.lang.String |
encoding |
private java.lang.String |
hypFile |
private org.apache.lucene.analysis.compound.hyphenation.HyphenationTree |
hyphenator |
private int |
maxSubwordSize |
private int |
minSubwordSize |
private int |
minWordSize |
private boolean |
onlyLongestMatch |
log
args, luceneMatchVersion
Constructor and Description |
---|
HyphenationCompoundWordTokenFilterFactory() |
Modifier and Type | Method and Description |
---|---|
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter |
create(org.apache.lucene.analysis.TokenStream input)
Transform the specified input TokenStream
|
void |
inform(ResourceLoader loader) |
void |
init(java.util.Map<java.lang.String,java.lang.String> args)
init will be called just once, immediately after creation. |
assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getSnowballWordSet, getWordSet, warnDeprecated
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getArgs
private org.apache.lucene.analysis.CharArraySet dictionary
private org.apache.lucene.analysis.compound.hyphenation.HyphenationTree hyphenator
private java.lang.String dictFile
private java.lang.String hypFile
private java.lang.String encoding
private int minWordSize
private int minSubwordSize
private int maxSubwordSize
private boolean onlyLongestMatch
public HyphenationCompoundWordTokenFilterFactory()
public void init(java.util.Map<java.lang.String,java.lang.String> args)
TokenFilterFactory
init
will be called just once, immediately after creation.
The args are user-level initialization parameters that may be specified when declaring the factory in the schema.xml
init
in interface TokenFilterFactory
init
in class BaseTokenStreamFactory
public void inform(ResourceLoader loader)
inform
in interface ResourceLoaderAware
public org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter create(org.apache.lucene.analysis.TokenStream input)
TokenFilterFactory
create
in interface TokenFilterFactory