public class RegexTokenizer extends UnaryTransformer<String,scala.collection.Seq<String>,RegexTokenizer>
gaps is true).
Optional parameters also allow filtering tokens using a minimal length.
It returns an array of strings that can be empty.| Constructor and Description |
|---|
RegexTokenizer() |
RegexTokenizer(String uid) |
| Modifier and Type | Method and Description |
|---|---|
BooleanParam |
gaps()
Indicates whether regex splits on gaps (true) or matches tokens (false).
|
boolean |
getGaps() |
int |
getMinTokenLength() |
String |
getPattern() |
IntParam |
minTokenLength()
Minimum token length, >= 0.
|
Param<String> |
pattern()
Regex pattern used to match delimiters if
gaps is true or tokens if gaps is false. |
RegexTokenizer |
setGaps(boolean value) |
RegexTokenizer |
setMinTokenLength(int value) |
RegexTokenizer |
setPattern(String value) |
String |
uid() |
setInputCol, setOutputCol, transform, transformSchemacopy, transform, transform, transformequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningclear, copyValues, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, setDefault, shouldOwn, validateParamspublic RegexTokenizer(String uid)
public RegexTokenizer()
public String uid()
public IntParam minTokenLength()
public RegexTokenizer setMinTokenLength(int value)
public int getMinTokenLength()
public BooleanParam gaps()
public RegexTokenizer setGaps(boolean value)
public boolean getGaps()
public Param<String> pattern()
gaps is true or tokens if gaps is false.
Default: "\\s+"public RegexTokenizer setPattern(String value)
public String getPattern()