|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.mllib.stat.Statistics
public class Statistics
:: Experimental :: API for statistical functions in MLlib.
| Constructor Summary | |
|---|---|
Statistics()
|
|
| Method Summary | |
|---|---|
static ChiSqTestResult |
chiSqTest(Matrix observed)
Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0. |
static ChiSqTestResult[] |
chiSqTest(RDD<LabeledPoint> data)
Conduct Pearson's independence test for every feature against the label across the input RDD. |
static ChiSqTestResult |
chiSqTest(Vector observed)
Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of 1 / observed.size. |
static ChiSqTestResult |
chiSqTest(Vector observed,
Vector expected)
Conduct Pearson's chi-squared goodness of fit test of the observed data against the expected distribution. |
static MultivariateStatisticalSummary |
colStats(RDD<Vector> X)
Computes column-wise summary statistics for the input RDD[Vector]. |
static double |
corr(JavaRDD<Double> x,
JavaRDD<Double> y)
Java-friendly version of corr() |
static double |
corr(JavaRDD<Double> x,
JavaRDD<Double> y,
String method)
Java-friendly version of corr() |
static double |
corr(RDD<Object> x,
RDD<Object> y)
Compute the Pearson correlation for the input RDDs. |
static double |
corr(RDD<Object> x,
RDD<Object> y,
String method)
Compute the correlation for the input RDDs using the specified method. |
static Matrix |
corr(RDD<Vector> X)
Compute the Pearson correlation matrix for the input RDD of Vectors. |
static Matrix |
corr(RDD<Vector> X,
String method)
Compute the correlation matrix for the input RDD of Vectors using the specified method. |
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public Statistics()
| Method Detail |
|---|
public static MultivariateStatisticalSummary colStats(RDD<Vector> X)
X - an RDD[Vector] for which column-wise summary statistics are to be computed.
MultivariateStatisticalSummary object containing column-wise summary statistics.public static Matrix corr(RDD<Vector> X)
X - an RDD[Vector] for which the correlation matrix is to be computed.
public static Matrix corr(RDD<Vector> X,
String method)
pearson (default), spearman.
Note that for Spearman, a rank correlation, we need to create an RDD[Double] for each column
and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector],
which is fairly costly. Cache the input RDD before calling corr with method = "spearman" to
avoid recomputing the common lineage.
X - an RDD[Vector] for which the correlation matrix is to be computed.method - String specifying the method to use for computing correlation.
Supported: pearson (default), spearman
public static double corr(RDD<Object> x,
RDD<Object> y)
Note: the two input RDDs need to have the same number of partitions and the same number of elements in each partition.
x - RDD[Double] of the same cardinality as y.y - RDD[Double] of the same cardinality as x.
public static double corr(JavaRDD<Double> x,
JavaRDD<Double> y)
corr()
public static double corr(RDD<Object> x,
RDD<Object> y,
String method)
pearson (default), spearman.
Note: the two input RDDs need to have the same number of partitions and the same number of elements in each partition.
x - RDD[Double] of the same cardinality as y.y - RDD[Double] of the same cardinality as x.method - String specifying the method to use for computing correlation.
Supported: pearson (default), spearman
public static double corr(JavaRDD<Double> x,
JavaRDD<Double> y,
String method)
corr()
public static ChiSqTestResult chiSqTest(Vector observed,
Vector expected)
Note: the two input Vectors need to have the same size.
observed cannot contain negative values.
expected cannot contain nonpositive values.
observed - Vector containing the observed categorical counts/relative frequencies.expected - Vector containing the expected categorical counts/relative frequencies.
expected is rescaled if the expected sum differs from the observed sum.
public static ChiSqTestResult chiSqTest(Vector observed)
1 / observed.size.
Note: observed cannot contain negative values.
observed - Vector containing the observed categorical counts/relative frequencies.
public static ChiSqTestResult chiSqTest(Matrix observed)
observed - The contingency matrix (containing either counts or relative frequencies).
public static ChiSqTestResult[] chiSqTest(RDD<LabeledPoint> data)
data - an RDD[LabeledPoint] containing the labeled dataset with categorical features.
Real-valued features will be treated as categorical for each distinct value.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||