Interface SupportsPushDownV2Filters
- All Superinterfaces:
ScanBuilder
ScanBuilder. Data sources can implement this interface to
push down V2 Predicate to the data source and reduce the size of the data to be read.
Please Note that this interface is preferred over SupportsPushDownFilters, which uses
V1 Filter and is less efficient due to the
internal -> external data conversion.
Iterative pushdown: When supportsIterativePushdown() returns true,
pushPredicates(Predicate[]) may be called multiple times on the same
ScanBuilder instance with additional predicates (e.g. PartitionPredicate).
The implementation must accumulate state across all calls, and
pushedPredicates() must return predicates from all of them.
- Since:
- 3.3.0
-
Method Summary
Modifier and TypeMethodDescriptionReturns the predicates that are pushed to the data source viapushPredicates(Predicate[]).pushPredicates(Predicate[] predicates) Pushes down predicates, and returns predicates that need to be evaluated after scanning.default booleanReturns true if this data source supports iterative filter pushdown.Methods inherited from interface org.apache.spark.sql.connector.read.ScanBuilder
build
-
Method Details
-
pushPredicates
Pushes down predicates, and returns predicates that need to be evaluated after scanning. Any predicate that the data source cannot fully push down must be returned as-is so that Spark can evaluate it after the scan; the data source must not modify or drop such predicates.Rows should be returned from the data source if and only if all of the predicates match. That is, predicates must be interpreted as ANDed together.
This method may be called multiple times with additional predicates (e.g.
PartitionPredicatewhensupportsIterativePushdown()returns true). The implementation must accumulate state across all calls so thatpushedPredicates()can return predicates from all of them.For each
PartitionPredicate, the implementation can usePartitionPredicate.references()(eachPartitionFieldReferencehasPartitionFieldReference.ordinal()) to decide whether to return it for post-scan filtering. For example, data sources with partition spec evolution may return predicates that reference later-added partition transforms (incompletely partitioned data) so Spark evaluates them after the scan, while predicates that reference only initially-added partition transforms may be fully pushed. -
pushedPredicates
Predicate[] pushedPredicates()Returns the predicates that are pushed to the data source viapushPredicates(Predicate[]).There are 3 kinds of predicates:
- pushable predicates which don't need to be evaluated again after scanning.
- pushable predicates which still need to be evaluated after scanning, e.g. parquet row group predicate.
- non-pushable predicates.
Both case 1 and 2 should be considered as pushed predicates and should be returned by this method.
When iterative pushdown is supported and
pushPredicates(Predicate[])was called multiple times, this method must return predicates from all calls.It's possible that there is no predicates in the query and
pushPredicates(Predicate[])is never called, empty array should be returned for this case. -
supportsIterativePushdown
default boolean supportsIterativePushdown()Returns true if this data source supports iterative filter pushdown. When true,pushPredicates(Predicate[])may be called multiple times with additional predicates (e.g.PartitionPredicate). The implementation must accumulate state across all calls, andpushedPredicates()must return predicates from all of them. See the class-level Javadoc for the full contract.- Since:
- 4.2.0
-