trait SupportsV1OverwriteWithSaveAsTable extends TableProvider
A marker interface that can be mixed into a TableProvider to indicate that the data
source needs to distinguish between DataFrameWriter V1 saveAsTable operations and
DataFrameWriter V2 createOrReplace/replace operations.
Background: DataFrameWriter V1's saveAsTable with SaveMode.Overwrite creates
a ReplaceTableAsSelect logical plan, which is identical to the plan created by
DataFrameWriter V2's createOrReplace. However, the documented semantics can have
different interpretations:
- V1 saveAsTable with Overwrite: "if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame" - does not define behavior for metadata (schema) overwriting
- V2 createOrReplace: "The output table's schema, partition layout, properties, and other configuration will be based on the contents of the data frame... If the table exists, its configuration and data will be replaced"
Data sources that migrated from V1 to V2 may have adopted different behaviors based on these
documented semantics. For example, Delta Lake interprets V1 saveAsTable to not replace table
schema unless the overwriteSchema option is explicitly set.
When a TableProvider implements this interface and
#addV1OverwriteWithSaveAsTableOption() returns true, DataFrameWriter V1 will add an
internal write option to indicate that the command originated from saveAsTable API.
The option key used is defined by #OPTION_NAME and the value will be set to "true".
This allows the data source to distinguish between the two APIs and apply appropriate
semantics.
- Annotations
- @Evolving()
- Source
- SupportsV1OverwriteWithSaveAsTable.java
- Since
4.1.0
- Alphabetic
- By Inheritance
- SupportsV1OverwriteWithSaveAsTable
- TableProvider
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def getTable(schema: StructType, partitioning: Array[Transform], properties: Map[String, String]): Table
Return a
Tableinstance with the specified table schema, partitioning and properties to do read/write.Return a
Tableinstance with the specified table schema, partitioning and properties to do read/write. The returned table should report the same schema and partitioning with the specified ones, or Spark may fail the operation.- schema
The specified table schema.
- partitioning
The specified table partitioning.
- properties
The specified table properties. It's case preserving (contains exactly what users specified) and implementations are free to use it case sensitively or insensitively. It should be able to identify a table, e.g. file path, Kafka topic name, etc.
- Definition Classes
- TableProvider
- abstract def inferSchema(options: CaseInsensitiveStringMap): StructType
Infer the schema of the table identified by the given options.
Infer the schema of the table identified by the given options.
- options
an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.
- Definition Classes
- TableProvider
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def addV1OverwriteWithSaveAsTableOption(): Boolean
Returns whether to add the "v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.
Returns whether to add the "v1_save_as_table_overwrite" to write operations originating from DataFrameWriter V1 saveAsTable with mode Overwrite. Implementations can override this method to control when the option is added.
- returns
true if the option should be added (default), false otherwise
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def inferPartitioning(options: CaseInsensitiveStringMap): Array[Transform]
Infer the partitioning of the table identified by the given options.
Infer the partitioning of the table identified by the given options.
By default this method returns empty partitioning, please override it if this source support partitioning.
- options
an immutable case-insensitive string-to-string map that can identify a table, e.g. file path, Kafka topic name, etc.
- Definition Classes
- TableProvider
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def supportsExternalMetadata(): Boolean
Returns true if the source has the ability of accepting external table metadata when getting tables.
Returns true if the source has the ability of accepting external table metadata when getting tables. The external table metadata includes:
- For table reader: user-specified schema from
DataFrameReader/DataStreamReaderand schema/partitioning stored in Spark catalog. - For table writer: the schema of the input
DataframeofDataframeWriter/DataStreamWriter.
By default this method returns false, which means the schema and partitioning passed to
Transform[], Map)are from the infer methods. Please override it if this source has expensive schema/partitioning inference and wants external table metadata to avoid inference.- Definition Classes
- TableProvider
- For table reader: user-specified schema from
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)