|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.mllib.util.LinearDataGenerator
public class LinearDataGenerator
:: DeveloperApi ::
Generate sample data used for Linear Data. This class generates
uniformly random values for every feature and adds Gaussian noise with mean eps to the
response variable Y.
| Constructor Summary | |
|---|---|
LinearDataGenerator()
|
|
| Method Summary | |
|---|---|
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
double[] xMean,
double[] xVariance,
int nPoints,
int seed,
double eps)
|
static scala.collection.Seq<LabeledPoint> |
generateLinearInput(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)^2^ / 12 which will be (1.0/3.0) |
static java.util.List<LabeledPoint> |
generateLinearInputAsList(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
Return a Java List of synthetic data randomly generated according to a multi collinear model. |
static RDD<LabeledPoint> |
generateLinearRDD(SparkContext sc,
int nexamples,
int nfeatures,
double eps,
int nparts,
double intercept)
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and uregularized variants. |
static void |
main(String[] args)
|
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public LinearDataGenerator()
| Method Detail |
|---|
public static java.util.List<LabeledPoint> generateLinearInputAsList(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
intercept - Data interceptweights - Weights to be applied.nPoints - Number of points in sample.seed - Random seedeps - (undocumented)
public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept,
double[] weights,
int nPoints,
int seed,
double eps)
intercept - Data interceptweights - Weights to be applied.nPoints - Number of points in sample.seed - Random seedeps - Epsilon scaling factor.
public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept,
double[] weights,
double[] xMean,
double[] xVariance,
int nPoints,
int seed,
double eps)
intercept - Data interceptweights - Weights to be applied.xMean - the mean of the generated features. Lots of time, if the features are not properly
standardized, the algorithm with poor implementation will have difficulty
to converge.xVariance - the variance of the generated features.nPoints - Number of points in sample.seed - Random seedeps - Epsilon scaling factor.
public static RDD<LabeledPoint> generateLinearRDD(SparkContext sc,
int nexamples,
int nfeatures,
double eps,
int nparts,
double intercept)
sc - SparkContext to be used for generating the RDD.nexamples - Number of examples that will be contained in the RDD.nfeatures - Number of features to generate for each example.eps - Epsilon factor by which examples are scaled.nparts - Number of partitions in the RDD. Default value is 2.
intercept - (undocumented)
public static void main(String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||