Interface SupportsRuntimeFiltering
- All Superinterfaces:
Scan,SupportsRuntimeV2Filtering
Scan. Data sources can implement this interface if they can
filter initially planned InputPartitions using predicates Spark infers at runtime.
Note that Spark will push runtime filters only if they are beneficial.
- Since:
- 3.2.0
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.sql.connector.read.Scan
Scan.ColumnarSupportMode -
Method Summary
Modifier and TypeMethodDescriptiondefault voidFilters this scan using runtime predicates.voidFilters this scan using runtime filters.Returns attributes this scan can be filtered by at runtime.Methods inherited from interface org.apache.spark.sql.connector.read.Scan
columnarSupportMode, description, readSchema, reportDriverMetrics, supportedCustomMetrics, toBatch, toContinuousStream, toMicroBatchStreamMethods inherited from interface org.apache.spark.sql.connector.read.SupportsRuntimeV2Filtering
pushedPredicates, supportsIterativePushdown
-
Method Details
-
filterAttributes
NamedReference[] filterAttributes()Returns attributes this scan can be filtered by at runtime.Spark will call
filter(Filter[])if it can derive a runtime predicate for any of the filter attributes.- Specified by:
filterAttributesin interfaceSupportsRuntimeV2Filtering
-
filter
Filters this scan using runtime filters.The provided expressions must be interpreted as a set of filters that are ANDed together. Implementations may use the filters to prune initially planned
InputPartitions.If the scan also implements
SupportsReportPartitioning, it must preserve the originally reported partitioning during runtime filtering. While applying runtime filters, the scan may detect that someInputPartitions have no matching data, in which case it can either replace the initially plannedInputPartitions that have no matching data with emptyInputPartitions, or report only a subset of the original partition values (omitting those with no data) viaBatch.planInputPartitions(). The scan must not report new partition values that were not present in the original partitioning.Note that Spark will call
Scan.toBatch()again after filtering the scan at runtime.- Parameters:
filters- data source filters used to filter the scan at runtime
-
filter
Description copied from interface:SupportsRuntimeV2FilteringFilters this scan using runtime predicates.The provided expressions must be interpreted as a set of predicates that are ANDed together. Implementations may use the predicates to prune initially planned
InputPartitions.If the scan also implements
SupportsReportPartitioning, it must preserve the originally reported partitioning during runtime filtering. While applying runtime predicates, the scan may detect that someInputPartitions have no matching data, in which case it can either replace the initially plannedInputPartitions that have no matching data with emptyInputPartitions, or report only a subset of the original partition values (omitting those with no data) viaBatch.planInputPartitions(). The scan must not report new partition values that were not present in the original partitioning.This method may be called multiple times with additional predicates (e.g.
PartitionPredicate) whenSupportsRuntimeV2Filtering.supportsIterativePushdown()returns true. The implementation must accumulate state across all calls so thatSupportsRuntimeV2Filtering.pushedPredicates()can return predicates from all of them.Note that Spark will call
Scan.toBatch()again after filtering the scan at runtime.- Specified by:
filterin interfaceSupportsRuntimeV2Filtering- Parameters:
predicates- data source V2 predicates used to filter the scan at runtime
-