VectorIndexerModel (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.ml.feature
Class VectorIndexerModel

Object
  org.apache.spark.ml.PipelineStage
      org.apache.spark.ml.Transformer
          org.apache.spark.ml.Model<VectorIndexerModel>
              org.apache.spark.ml.feature.VectorIndexerModel

All Implemented Interfaces:: java.io.Serializable, Logging, Params

public class VectorIndexerModel
extends Model<VectorIndexerModel>
extends Model<VectorIndexerModel>

:: Experimental :: Transform categorical features to use 0-based indices instead of their original values. - Categorical features are mapped to indices. - Continuous features (columns) are left unchanged. This also appends metadata to the output column, marking features as Numeric (continuous), Nominal (categorical), or Binary (either continuous or categorical). Non-ML metadata is not carried over from the input to the output column.

This maintains vector sparsity.

param: numFeatures Number of features, i.e., length of Vectors which this transforms param: categoryMaps Feature value index. Keys are categorical feature indices (column indices). Values are maps from original features values to 0-based category indices. If a feature is not in this map, it is treated as continuous.

See Also:: Serialized Form

Method Summary
`scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>>`	`categoryMaps()`
`VectorIndexerModel`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`int`	`getMaxCategories()`
`java.util.Map<Integer,java.util.Map<Double,Integer>>`	`javaCategoryMaps()` Java-friendly version of `categoryMaps`
`IntParam`	`maxCategories()` Threshold for the number of values a categorical feature can take.
`int`	`numFeatures()`
`VectorIndexerModel`	`setInputCol(String value)`
`VectorIndexerModel`	`setOutputCol(String value)`
`DataFrame`	`transform(DataFrame dataset)` Transforms the input dataset.
`StructType`	`transformSchema(StructType schema)` :: DeveloperApi ::
`String`	`uid()`

Methods inherited from class org.apache.spark.ml.Model
`hasParent, parent, setParent`

Methods inherited from class org.apache.spark.ml.Transformer
`transform, transform, transform`

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.apache.spark.ml.param.Params
`clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParams`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Method Detail

uid

public String uid()

numFeatures

public int numFeatures()

categoryMaps

public scala.collection.immutable.Map<Object,scala.collection.immutable.Map<Object,Object>> categoryMaps()

javaCategoryMaps

public java.util.Map<Integer,java.util.Map<Double,Integer>> javaCategoryMaps()

Java-friendly version of categoryMaps

setInputCol

public VectorIndexerModel setInputCol(String value)

setOutputCol

public VectorIndexerModel setOutputCol(String value)

transform

public DataFrame transform(DataFrame dataset)

Description copied from class: Transformer

Transforms the input dataset.

Specified by:: transform in class Transformer

Parameters:: dataset - (undocumented)
Returns:: (undocumented)

transformSchema

public StructType transformSchema(StructType schema)

Description copied from class: PipelineStage

:: DeveloperApi ::

Derives the output schema from the input schema.

Specified by:: transformSchema in class PipelineStage

Parameters:: schema - (undocumented)
Returns:: (undocumented)

copy

public VectorIndexerModel copy(ParamMap extra)

Description copied from interface: Params

Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly.

Specified by:: copy in interface Params
Specified by:: copy in class Model<VectorIndexerModel>

Parameters:: extra - (undocumented)
Returns:: (undocumented)
See Also:: defaultCopy()

maxCategories

public IntParam maxCategories()

Threshold for the number of values a categorical feature can take. If a feature is found to have > maxCategories values, then it is declared continuous. Must be >= 2.

(default = 20)

Returns:: (undocumented)

getMaxCategories

public int getMaxCategories()