public class DatasetSplitter extends Object
| Constructor and Description |
|---|
DatasetSplitter(double testRatio,
double crossValidationRatio)
Create a
DatasetSplitter by giving test and cross validation IDXs sizes |
| Modifier and Type | Method and Description |
|---|---|
void |
split(IndexReader originalIndex,
Directory trainingIndex,
Directory testIndex,
Directory crossValidationIndex,
Analyzer analyzer,
boolean termVectors,
String classFieldName,
String... fieldNames)
Split a given index into 3 indexes for training, test and cross validation tasks respectively
|
public DatasetSplitter(double testRatio,
double crossValidationRatio)
DatasetSplitter by giving test and cross validation IDXs sizestestRatio - the ratio of the original index to be used for the test IDX as a double between 0.0 and 1.0crossValidationRatio - the ratio of the original index to be used for the c.v. IDX as a double between 0.0 and 1.0public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) throws IOException
originalIndex - an LeafReader on the source indextrainingIndex - a Directory used to write the training indextestIndex - a Directory used to write the test indexcrossValidationIndex - a Directory used to write the cross validation indexanalyzer - Analyzer used to create the new docstermVectors - true if term vectors should be keptclassFieldName - name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames - names of fields that need to be put in the new indexes or null if all should be usedIOException - if any writing operation fails on any of the indexesCopyright © 2000-2024 Apache Software Foundation. All Rights Reserved.