Interface SupportsV1OverwriteWithSaveAsTable

All Superinterfaces:
TableProvider

@Evolving public interface SupportsV1OverwriteWithSaveAsTable extends TableProvider
A marker interface that can be mixed into a TableProvider to indicate that the data source needs to distinguish between DataFrameWriter V1 saveAsTable operations and DataFrameWriter V2 createOrReplace/replace operations.

Background: DataFrameWriter V1's saveAsTable with SaveMode.Overwrite creates a ReplaceTableAsSelect logical plan, which is identical to the plan created by DataFrameWriter V2's createOrReplace. However, the documented semantics can have different interpretations:

  • V1 saveAsTable with Overwrite: "if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame" - does not define behavior for metadata (schema) overwriting
  • V2 createOrReplace: "The output table's schema, partition layout, properties, and other configuration will be based on the contents of the data frame... If the table exists, its configuration and data will be replaced"

Data sources that migrated from V1 to V2 may have adopted different behaviors based on these documented semantics. For example, Delta Lake interprets V1 saveAsTable to not replace table schema unless the overwriteSchema option is explicitly set.

When a TableProvider implements this interface and addV1OverwriteWithSaveAsTableOption() returns true, DataFrameWriter V1 will add an internal write option to indicate that the command originated from saveAsTable API. The option key used is defined by OPTION_NAME and the value will be set to "true". This allows the data source to distinguish between the two APIs and apply appropriate semantics.

Since:
4.1.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    The name of the internal write option that indicates the command originated from DataFrameWriter V1 saveAsTable API.
  • Method Summary

    Modifier and Type
    Method
    Description
    default boolean
    Returns whether to add the "__v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite.

    Methods inherited from interface org.apache.spark.sql.connector.catalog.TableProvider

    getTable, inferPartitioning, inferSchema, supportsExternalMetadata
  • Field Details

    • OPTION_NAME

      static final String OPTION_NAME
      The name of the internal write option that indicates the command originated from DataFrameWriter V1 saveAsTable API.
      See Also:
  • Method Details

    • addV1OverwriteWithSaveAsTableOption

      default boolean addV1OverwriteWithSaveAsTableOption()
      Returns whether to add the "__v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.
      Returns:
      true if the option should be added (default), false otherwise