|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
org.apache.spark.ml.Model<VectorIndexerModel>
org.apache.spark.ml.feature.VectorIndexerModel
public class VectorIndexerModel
:: Experimental :: Transform categorical features to use 0-based indices instead of their original values. - Categorical features are mapped to indices. - Continuous features (columns) are left unchanged. This also appends metadata to the output column, marking features as Numeric (continuous), Nominal (categorical), or Binary (either continuous or categorical). Non-ML metadata is not carried over from the input to the output column.
This maintains vector sparsity.
param: numFeatures Number of features, i.e., length of Vectors which this transforms param: categoryMaps Feature value index. Keys are categorical feature indices (column indices). Values are maps from original features values to 0-based category indices. If a feature is not in this map, it is treated as continuous.
| Method Summary | |
|---|---|
scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> |
categoryMaps()
|
VectorIndexerModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params. |
int |
getMaxCategories()
|
java.util.Map<Integer,java.util.Map<Double,Integer>> |
javaCategoryMaps()
Java-friendly version of categoryMaps |
IntParam |
maxCategories()
Threshold for the number of values a categorical feature can take. |
int |
numFeatures()
|
VectorIndexerModel |
setInputCol(String value)
|
VectorIndexerModel |
setOutputCol(String value)
|
DataFrame |
transform(DataFrame dataset)
Transforms the input dataset. |
StructType |
transformSchema(StructType schema)
:: DeveloperApi :: |
String |
uid()
|
| Methods inherited from class org.apache.spark.ml.Model |
|---|
hasParent, parent, setParent |
| Methods inherited from class org.apache.spark.ml.Transformer |
|---|
transform, transform, transform |
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.spark.ml.param.Params |
|---|
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams |
| Methods inherited from interface org.apache.spark.Logging |
|---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
| Method Detail |
|---|
public String uid()
public int numFeatures()
public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()
public java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()
categoryMaps
public VectorIndexerModel setInputCol(String value)
public VectorIndexerModel setOutputCol(String value)
public DataFrame transform(DataFrame dataset)
Transformer
transform in class Transformerdataset - (undocumented)
public StructType transformSchema(StructType schema)
PipelineStageDerives the output schema from the input schema.
transformSchema in class PipelineStageschema - (undocumented)
public VectorIndexerModel copy(ParamMap extra)
Params
copy in interface Paramscopy in class Model<VectorIndexerModel>extra - (undocumented)
defaultCopy()public IntParam maxCategories()
(default = 20)
public int getMaxCategories()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||