Packages

t

org.apache.spark.sql.connector.catalog

SupportsV1OverwriteWithSaveAsTable

trait SupportsV1OverwriteWithSaveAsTable extends TableProvider

A marker interface that can be mixed into a TableProvider to indicate that the data source needs to distinguish between DataFrameWriter V1 saveAsTable operations and DataFrameWriter V2 createOrReplace/replace operations.

Background: DataFrameWriter V1's saveAsTable with SaveMode.Overwrite creates a ReplaceTableAsSelect logical plan, which is identical to the plan created by DataFrameWriter V2's createOrReplace. However, the documented semantics can have different interpretations:

  • V1 saveAsTable with Overwrite: "if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame" - does not define behavior for metadata (schema) overwriting
  • V2 createOrReplace: "The output table's schema, partition layout, properties, and other configuration will be based on the contents of the data frame... If the table exists, its configuration and data will be replaced"

Data sources that migrated from V1 to V2 may have adopted different behaviors based on these documented semantics. For example, Delta Lake interprets V1 saveAsTable to not replace table schema unless the overwriteSchema option is explicitly set.

When a TableProvider implements this interface and #addV1OverwriteWithSaveAsTableOption() returns true, DataFrameWriter V1 will add an internal write option to indicate that the command originated from saveAsTable API. The option key used is defined by #OPTION_NAME and the value will be set to "true". This allows the data source to distinguish between the two APIs and apply appropriate semantics.

Annotations
@Evolving()
Source
SupportsV1OverwriteWithSaveAsTable.java
Since

4.1.0

Linear Supertypes
TableProvider, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SupportsV1OverwriteWithSaveAsTable
  2. TableProvider
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def getTable(schema: StructType, partitioning: Array[Transform], properties: Map[String, String]): Table

    Return a Table instance with the specified table schema, partitioning and properties to do read/write.

    Return a Table instance with the specified table schema, partitioning and properties to do read/write. The returned table should report the same schema and partitioning with the specified ones, or Spark may fail the operation.

    schema

    The specified table schema.

    partitioning

    The specified table partitioning.

    properties

    The specified table properties. It's case preserving (contains exactly what users specified) and implementations are free to use it case sensitively or insensitively. It should be able to identify a table, e.g. file path, Kafka topic name, etc.

    Definition Classes
    TableProvider
  2. abstract def inferSchema(options: CaseInsensitiveStringMap): StructType

    Infer the schema of the table identified by the given options.

    Infer the schema of the table identified by the given options.

    options

    an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.

    Definition Classes
    TableProvider

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def addV1OverwriteWithSaveAsTableOption(): Boolean

    Returns whether to add the "v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.

    Returns whether to add the "v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.

    returns

    true if the option should be added (default), false otherwise

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  9. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  11. def inferPartitioning(options: CaseInsensitiveStringMap): Array[Transform]

    Infer the partitioning of the table identified by the given options.

    Infer the partitioning of the table identified by the given options.

    By default this method returns empty partitioning, please override it if this source support partitioning.

    options

    an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.

    Definition Classes
    TableProvider
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  16. def supportsExternalMetadata(): Boolean

    Returns true if the source has the ability of accepting external table metadata when getting tables.

    Returns true if the source has the ability of accepting external table metadata when getting tables. The external table metadata includes:

    • For table reader: user-specified schema from DataFrameReader/ DataStreamReader and schema/partitioning stored in Spark catalog.
    • For table writer: the schema of the input Dataframe of DataframeWriter/DataStreamWriter.

    By default this method returns false, which means the schema and partitioning passed to Transform[], Map) are from the infer methods. Please override it if this source has expensive schema/partitioning inference and wants external table metadata to avoid inference.

    Definition Classes
    TableProvider
  17. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  18. def toString(): String
    Definition Classes
    AnyRef → Any
  19. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  20. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from TableProvider

Inherited from AnyRef

Inherited from Any

Ungrouped