public class JavaSchemaRDD extends Object implements JavaRDDLike<Row,JavaRDD<Row>>
Row objects that is returned as the result of a Spark SQL query. In addition to
standard RDD operations, a JavaSchemaRDD can also be registered as a table in the JavaSQLContext
that was used to create. Registering a JavaSchemaRDD allows its contents to be queried in
future SQL statement.
| Constructor and Description |
|---|
JavaSchemaRDD(SQLContext sqlContext,
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan) |
| Modifier and Type | Method and Description |
|---|---|
SchemaRDD |
baseSchemaRDD() |
JavaSchemaRDD |
cache()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
scala.reflect.ClassTag<Row> |
classTag() |
JavaSchemaRDD |
coalesce(int numPartitions,
boolean shuffle)
Return a new RDD that is reduced into
numPartitions partitions. |
JavaSchemaRDD |
distinct()
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
distinct(int numPartitions)
Return a new RDD containing the distinct elements in this RDD.
|
JavaSchemaRDD |
filter(Function<Row,Boolean> f)
Return a new RDD containing only the elements that satisfy a predicate.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
int numPartitions)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
intersection(JavaSchemaRDD other,
Partitioner partitioner)
Return the intersection of this RDD and another one.
|
JavaSchemaRDD |
persist()
Persist this RDD with the default storage level (`MEMORY_ONLY`).
|
JavaSchemaRDD |
persist(StorageLevel newLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
RDD<Row> |
rdd() |
JavaSchemaRDD |
repartition(int numPartitions)
Return a new RDD that has exactly
numPartitions partitions. |
JavaSchemaRDD |
setName(String name)
Assign a name to this RDD
|
SQLContext |
sqlContext() |
JavaSchemaRDD |
subtract(JavaSchemaRDD other)
Return an RDD with the elements from
this that are not in other. |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
int numPartitions)
Return an RDD with the elements from
this that are not in other. |
JavaSchemaRDD |
subtract(JavaSchemaRDD other,
Partitioner p)
Return an RDD with the elements from
this that are not in other. |
String |
toString() |
JavaSchemaRDD |
unpersist(boolean blocking)
Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
|
JavaRDD<Row> |
wrapRDD(RDD<Row> rdd) |
aggregate, cartesian, checkpoint, collect, collectPartitions, context, count, countApprox, countApprox, countApproxDistinct, countByValue, countByValueApprox, countByValueApprox, first, flatMap, flatMapToDouble, flatMapToPair, fold, foreach, foreachPartition, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, id, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitions, mapPartitionsToDouble, mapPartitionsToDouble, mapPartitionsToPair, mapPartitionsToPair, mapPartitionsWithIndex, mapToDouble, mapToPair, max, min, name, pipe, pipe, pipe, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, splits, take, takeOrdered, takeOrdered, takeSample, takeSample, toArray, toDebugString, toLocalIterator, top, top, zip, zipPartitions, zipWithIndex, zipWithUniqueIdpublic JavaSchemaRDD(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan)
public SQLContext sqlContext()
public SchemaRDD baseSchemaRDD()
public scala.reflect.ClassTag<Row> classTag()
classTag in interface JavaRDDLike<Row,JavaRDD<Row>>public JavaRDD<Row> wrapRDD(RDD<Row> rdd)
wrapRDD in interface JavaRDDLike<Row,JavaRDD<Row>>public String toString()
toString in class Objectpublic JavaSchemaRDD cache()
public JavaSchemaRDD persist()
public JavaSchemaRDD persist(StorageLevel newLevel)
public JavaSchemaRDD unpersist(boolean blocking)
blocking - Whether to block until all blocks are deleted.public JavaSchemaRDD setName(String name)
public JavaSchemaRDD coalesce(int numPartitions, boolean shuffle)
numPartitions partitions.public JavaSchemaRDD distinct()
public JavaSchemaRDD distinct(int numPartitions)
public JavaSchemaRDD filter(Function<Row,Boolean> f)
public JavaSchemaRDD intersection(JavaSchemaRDD other)
Note that this method performs a shuffle internally.
public JavaSchemaRDD intersection(JavaSchemaRDD other, Partitioner partitioner)
Note that this method performs a shuffle internally.
partitioner - Partitioner to use for the resulting RDDpublic JavaSchemaRDD intersection(JavaSchemaRDD other, int numPartitions)
Note that this method performs a shuffle internally.
numPartitions - How many partitions to use in the resulting RDDpublic JavaSchemaRDD repartition(int numPartitions)
numPartitions partitions.
Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
If you are decreasing the number of partitions in this RDD, consider using coalesce,
which can avoid performing a shuffle.
public JavaSchemaRDD subtract(JavaSchemaRDD other)
this that are not in other.
Uses this partitioner/partition size, because even if other is huge, the resulting
RDD will be <= us.
public JavaSchemaRDD subtract(JavaSchemaRDD other, int numPartitions)
this that are not in other.public JavaSchemaRDD subtract(JavaSchemaRDD other, Partitioner p)
this that are not in other.