| Package | Description |
|---|---|
| org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
| Class and Description |
|---|
| ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
|
| ICUTokenizerConfig
Class that allows for tailored Unicode Text Segmentation on
a per-writing system basis.
|
Copyright © 2000-2024 Apache Software Foundation. All Rights Reserved.