public class OpenNLPLemmatizerFilter extends TokenFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.
Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.
The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech
AttributeSource.StateinputDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
OpenNLPLemmatizerFilter(TokenStream input,
NLPLemmatizerOp lemmatizerOp) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
incrementToken() |
void |
reset() |
close, endaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic OpenNLPLemmatizerFilter(TokenStream input, NLPLemmatizerOp lemmatizerOp)
public final boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenFilterIOExceptionCopyright © 2000-2024 Apache Software Foundation. All Rights Reserved.