Interface SupportsV1OverwriteWithSaveAsTable
- All Superinterfaces:
TableProvider
TableProvider to indicate that the data
source needs to distinguish between DataFrameWriter V1 saveAsTable operations and
DataFrameWriter V2 createOrReplace/replace operations.
Background: DataFrameWriter V1's saveAsTable with SaveMode.Overwrite creates
a ReplaceTableAsSelect logical plan, which is identical to the plan created by
DataFrameWriter V2's createOrReplace. However, the documented semantics can have
different interpretations:
- V1 saveAsTable with Overwrite: "if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame" - does not define behavior for metadata (schema) overwriting
- V2 createOrReplace: "The output table's schema, partition layout, properties, and other configuration will be based on the contents of the data frame... If the table exists, its configuration and data will be replaced"
Data sources that migrated from V1 to V2 may have adopted different behaviors based on these
documented semantics. For example, Delta Lake interprets V1 saveAsTable to not replace table
schema unless the overwriteSchema option is explicitly set.
When a TableProvider implements this interface and
addV1OverwriteWithSaveAsTableOption() returns true, DataFrameWriter V1 will add an
internal write option to indicate that the command originated from saveAsTable API.
The option key used is defined by OPTION_NAME and the value will be set to "true".
This allows the data source to distinguish between the two APIs and apply appropriate
semantics.
- Since:
- 4.1.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe name of the internal write option that indicates the command originated from DataFrameWriter V1 saveAsTable API. -
Method Summary
Modifier and TypeMethodDescriptiondefault booleanReturns whether to add the "__v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite.Methods inherited from interface org.apache.spark.sql.connector.catalog.TableProvider
getTable, inferPartitioning, inferSchema, supportsExternalMetadata
-
Field Details
-
OPTION_NAME
The name of the internal write option that indicates the command originated from DataFrameWriter V1 saveAsTable API.- See Also:
-
-
Method Details
-
addV1OverwriteWithSaveAsTableOption
default boolean addV1OverwriteWithSaveAsTableOption()Returns whether to add the "__v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.- Returns:
- true if the option should be added (default), false otherwise
-