public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Concrete implementations can leverage the following input attributes. All attributes are optional:
ignoreCase defaults to false
words should be the name of a stopwords file to parse, if not specified the
factory will use the value provided by createDefaultWords() implementation in
concrete subclass.
format defines how the words file will be parsed, and defaults to
wordset. If words is not specified, then format must
not be specified.
The valid values for the format option are:
wordset - This is the default format, which supports one word per line
(including any intra-word whitespace) and allows whole line comments beginning with the "#"
character. Blank lines are ignored. See WordlistLoader.getLines for details.
snowball - This format allows for multiple words specified on each line, and
trailing comments may be specified using the vertical line ("|"). Blank lines are
ignored. See WordlistLoader.getSnowballWordSet
for details.
| Modifier and Type | Field and Description |
|---|---|
static String |
FORMAT_SNOWBALL |
static String |
FORMAT_WORDSET |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion| Constructor and Description |
|---|
AbstractWordsFileFilterFactory(Map<String,String> args)
Initialize this factory via a set of key-value pairs.
|
| Modifier and Type | Method and Description |
|---|---|
protected abstract CharArraySet |
createDefaultWords()
Default word set implementation.
|
String |
getFormat() |
String |
getWordFiles() |
CharArraySet |
getWords() |
void |
inform(ResourceLoader loader)
Initialize the set of stopwords provided via ResourceLoader, or using defaults.
|
boolean |
isIgnoreCase() |
availableTokenFilters, create, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersget, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNamespublic static final String FORMAT_WORDSET
public static final String FORMAT_SNOWBALL
public void inform(ResourceLoader loader) throws IOException
inform in interface ResourceLoaderAwareIOExceptionprotected abstract CharArraySet createDefaultWords()
public CharArraySet getWords()
public String getWordFiles()
public String getFormat()
public boolean isIgnoreCase()
Copyright © 2000-2024 Apache Software Foundation. All Rights Reserved.