Class PartitionPredicate

Object
org.apache.spark.sql.internal.connector.ExpressionWithToString
All Implemented Interfaces:
Serializable, Expression

@Evolving public abstract class PartitionPredicate extends Predicate
Represents a partition predicate that can be evaluated using Table.partitioning().

Connectors are expected to leverage partition predicates for pruning whenever they have partition metadata to evaluate them. Use eval(InternalRow) to evaluate this predicate against a single partition's keys.

Since:
4.2.0
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     

    Fields inherited from interface org.apache.spark.sql.connector.expressions.Expression

    EMPTY_EXPRESSION, EMPTY_NAMED_REFERENCE
  • Method Summary

    Modifier and Type
    Method
    Description
    abstract boolean
    eval(org.apache.spark.sql.catalyst.InternalRow partitionKey)
    Evaluates this predicate against a single partition's keys.
    abstract NamedReference[]
    List of fields or columns that are referenced by this expression.

    Methods inherited from class org.apache.spark.sql.connector.expressions.GeneralScalarExpression

    children, equals, hashCode, name

    Methods inherited from class org.apache.spark.sql.internal.connector.ExpressionWithToString

    describe, toString

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Field Details

  • Method Details

    • references

      public abstract NamedReference[] references()
      List of fields or columns that are referenced by this expression.

      For PartitionPredicate, returns PartitionFieldReference instances that identify the partition fields (from Table.partitioning()) referenced by this predicate. Each reference's NamedReference.fieldNames() gives the partition field name; PartitionFieldReference.ordinal() gives the 0-based position in Table.partitioning().

      Example: Suppose Table.partitioning() returns three partition transforms: [years(ts), months(ts), bucket(32, id)] with ordinals 0, 1, 2. Each PartitionFieldReference has NamedReference.fieldNames() (the transform display name, e.g. years(ts)) and PartitionFieldReference.ordinal():

      • years(ts) = 2026 returns one reference: (fieldNames=[years(ts)], ordinal=0).
      • years(ts) = 2026 and months(ts) = 01 returns two references: (fieldNames=[years(ts)], ordinal=0), (fieldNames=[months(ts)], ordinal=1).
      • bucket(32, id) = 1 returns one reference: (fieldNames=[bucket(32, id)], ordinal=2).

      Data sources can use these references to decide whether to return a predicate for post-scan filtering. For example, sources supporting partition spec evolution should return PartitionPredicates that reference later-added partition transforms (incompletely partitioned data) to Spark for post-scan filter, while predicates that reference only initially-added partition transforms may be fully pushed.

      Specified by:
      references in interface Expression
      Overrides:
      references in class org.apache.spark.sql.internal.connector.ExpressionWithToString
      Returns:
      array of partition field references
    • eval

      public abstract boolean eval(org.apache.spark.sql.catalyst.InternalRow partitionKey)
      Evaluates this predicate against a single partition's keys.

      The caller must pass the full partition key: one value per partition transform in Table.partitioning(), in order. A key for only a subset of referenced fields is not supported.

      Parameters:
      partitionKey - the full partition key for one partition, ordered according to Table.partitioning().
      Returns:
      true if the partition represented by these keys satisfies this predicate.