Interface SupportsPushDownV2Filters

All Superinterfaces:
ScanBuilder

@Evolving public interface SupportsPushDownV2Filters extends ScanBuilder
A mix-in interface for ScanBuilder. Data sources can implement this interface to push down V2 Predicate to the data source and reduce the size of the data to be read. Please Note that this interface is preferred over SupportsPushDownFilters, which uses V1 Filter and is less efficient due to the internal -> external data conversion.

Iterative pushdown: When supportsIterativePushdown() returns true, pushPredicates(Predicate[]) may be called multiple times on the same ScanBuilder instance with additional predicates (e.g. PartitionPredicate). The implementation must accumulate state across all calls, and pushedPredicates() must return predicates from all of them.

Since:
3.3.0
  • Method Details

    • pushPredicates

      Predicate[] pushPredicates(Predicate[] predicates)
      Pushes down predicates, and returns predicates that need to be evaluated after scanning. Any predicate that the data source cannot fully push down must be returned as-is so that Spark can evaluate it after the scan; the data source must not modify or drop such predicates.

      Rows should be returned from the data source if and only if all of the predicates match. That is, predicates must be interpreted as ANDed together.

      This method may be called multiple times with additional predicates (e.g. PartitionPredicate when supportsIterativePushdown() returns true). The implementation must accumulate state across all calls so that pushedPredicates() can return predicates from all of them.

      For each PartitionPredicate, the implementation can use PartitionPredicate.references() (each PartitionFieldReference has PartitionFieldReference.ordinal()) to decide whether to return it for post-scan filtering. For example, data sources with partition spec evolution may return predicates that reference later-added partition transforms (incompletely partitioned data) so Spark evaluates them after the scan, while predicates that reference only initially-added partition transforms may be fully pushed.

    • pushedPredicates

      Predicate[] pushedPredicates()
      Returns the predicates that are pushed to the data source via pushPredicates(Predicate[]).

      There are 3 kinds of predicates:

      1. pushable predicates which don't need to be evaluated again after scanning.
      2. pushable predicates which still need to be evaluated after scanning, e.g. parquet row group predicate.
      3. non-pushable predicates.

      Both case 1 and 2 should be considered as pushed predicates and should be returned by this method.

      When iterative pushdown is supported and pushPredicates(Predicate[]) was called multiple times, this method must return predicates from all calls.

      It's possible that there is no predicates in the query and pushPredicates(Predicate[]) is never called, empty array should be returned for this case.

    • supportsIterativePushdown

      default boolean supportsIterativePushdown()
      Returns true if this data source supports iterative filter pushdown. When true, pushPredicates(Predicate[]) may be called multiple times with additional predicates (e.g. PartitionPredicate). The implementation must accumulate state across all calls, and pushedPredicates() must return predicates from all of them. See the class-level Javadoc for the full contract.
      Since:
      4.2.0