Interface SupportsRuntimeV2Filtering
- All Superinterfaces:
Scan
- All Known Subinterfaces:
SupportsRuntimeFiltering
Scan. Data sources can implement this interface if they can
filter initially planned InputPartitions using predicates Spark infers at runtime.
This interface is very similar to SupportsRuntimeFiltering except it uses
data source V2 Predicate instead of data source V1 Filter.
SupportsRuntimeV2Filtering is preferred over SupportsRuntimeFiltering
and only one of them should be implemented by the data sources.
Iterative filtering: When supportsIterativePushdown() returns true,
filter(Predicate[]) may be called multiple times on the same
Scan instance. The first call pushes translated V2 predicates; the second call
pushes PartitionPredicate instances derived from runtime filters whose translated
form was not already accepted (via pushedPredicates()) in the first call.
The implementation must accumulate state across all calls, and
pushedPredicates() must return predicates from all of them.
Note that Spark will push runtime filters only if they are beneficial.
- Since:
- 3.4.0
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.sql.connector.read.Scan
Scan.ColumnarSupportMode -
Method Summary
Modifier and TypeMethodDescriptionvoidFilters this scan using runtime predicates.Returns attributes this scan can be filtered by at runtime.default Predicate[]Returns the predicates that are pushed to the data source viafilter(Predicate[]).default booleanReturns true if this scan supports iterative runtime filtering.Methods inherited from interface org.apache.spark.sql.connector.read.Scan
columnarSupportMode, description, readSchema, reportDriverMetrics, supportedCustomMetrics, toBatch, toContinuousStream, toMicroBatchStream
-
Method Details
-
filterAttributes
NamedReference[] filterAttributes()Returns attributes this scan can be filtered by at runtime.Spark will call
filter(Predicate[])if it can derive a runtime predicate for any of the filter attributes. -
filter
Filters this scan using runtime predicates.The provided expressions must be interpreted as a set of predicates that are ANDed together. Implementations may use the predicates to prune initially planned
InputPartitions.If the scan also implements
SupportsReportPartitioning, it must preserve the originally reported partitioning during runtime filtering. While applying runtime predicates, the scan may detect that someInputPartitions have no matching data, in which case it can either replace the initially plannedInputPartitions that have no matching data with emptyInputPartitions, or report only a subset of the original partition values (omitting those with no data) viaBatch.planInputPartitions(). The scan must not report new partition values that were not present in the original partitioning.This method may be called multiple times with additional predicates (e.g.
PartitionPredicate) whensupportsIterativePushdown()returns true. The implementation must accumulate state across all calls so thatpushedPredicates()can return predicates from all of them.Note that Spark will call
Scan.toBatch()again after filtering the scan at runtime.- Parameters:
predicates- data source V2 predicates used to filter the scan at runtime
-
pushedPredicates
Returns the predicates that are pushed to the data source viafilter(Predicate[]).When iterative filtering is supported and
filter(Predicate[])was called multiple times, this method must return predicates from all calls.It's possible that there are no runtime predicates and
filter(Predicate[])is never called; an empty array should be returned for this case.- Since:
- 4.2.0
-
supportsIterativePushdown
default boolean supportsIterativePushdown()Returns true if this scan supports iterative runtime filtering. When true,filter(Predicate[])may be called multiple times with additional predicates. The implementation must accumulate state across all calls, andpushedPredicates()must return predicates from all of them. See the class-level Javadoc for the full contract.When enabled, Spark will derive
PartitionPredicateinstances from the runtime filters and push them via a subsequentfilter(Predicate[])call.- Since:
- 4.2.0
-