All Superinterfaces:: CatalogPlugin

All Known Subinterfaces:: CatalogExtension, StagingTableCatalog

All Known Implementing Classes:: DelegatingCatalogExtension

@Evolving public interface TableCatalog extends CatalogPlugin

Catalog methods for working with Tables.

TableCatalog implementations may be case-sensitive or case-insensitive. Spark will pass table identifiers without modification. Field names passed to alterTable(Identifier, TableChange...) will be normalized to match the case used in the table schema when updating, renaming, or dropping existing columns when catalyst analysis is case-insensitive.

Since:: 3.0.0

Field Summary

Fields

Modifier and Type

Field

Description

static final String

OPTION_PREFIX

A prefix used to pass OPTIONS in table properties

static final String

PROP_COLLATION

A reserved property to specify the collation of the table.

static final String

PROP_COMMENT

A reserved property to specify the description of the table.

static final String

PROP_EXTERNAL

A reserved property to specify a table was created with EXTERNAL.

static final String

PROP_IS_MANAGED_LOCATION

A reserved property to indicate that the table location is managed, not user-specified.

static final String

PROP_LOCATION

A reserved property to specify the location of the table.

static final String

PROP_OWNER

A reserved property to specify the owner of the table.

static final String

PROP_PROVIDER

A reserved property to specify the provider of the table.

static final String

PROP_TABLE_TYPE

A reserved property that indicates table entity type (external, managed, view, etc.).
Method Summary

Modifier and Type

Method

Description

Table

alterTable(Identifier ident, TableChange... changes)

Apply a set of changes to a table in the catalog.

default Set<TableCatalogCapability>

capabilities()

default Table

createTable(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties)

Deprecated.
This is deprecated.

default Table

createTable(Identifier ident, TableInfo tableInfo)

Create a table in the catalog.

default Table

createTable(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties)

Deprecated.
This is deprecated.

default Table

createTableLike(Identifier ident, TableInfo tableInfo, Table sourceTable)

Create a table in the catalog by copying metadata from an existing source table.

boolean

dropTable(Identifier ident)

Drop a table in the catalog.

default void

invalidateTable(Identifier ident)

Invalidate cached table metadata for an identifier.

Identifier[]

listTables(String[] namespace)

List the tables in a namespace from the catalog.

default TableSummary[]

listTableSummaries(String[] namespace)

List the table summaries in a namespace from the catalog.

default Changelog

loadChangelog(Identifier ident, ChangelogInfo changelogInfo)

Load a Changelog for the given table, representing the row-level changes within the range specified by changelogInfo.

Table

loadTable(Identifier ident)

Load table metadata by identifier from the catalog.

default Table

loadTable(Identifier ident, long timestamp)

Load table metadata at a specific time by identifier from the catalog.

default Table

loadTable(Identifier ident, String version)

Load table metadata of a specific version by identifier from the catalog.

default Table

loadTable(Identifier ident, Set<TableWritePrivilege> writePrivileges)

Load table metadata by identifier from the catalog.

default boolean

purgeTable(Identifier ident)

Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.

void

renameTable(Identifier oldIdent, Identifier newIdent)

Renames a table in the catalog.

default boolean

tableExists(Identifier ident)

Test whether a table exists using an identifier from the catalog.

default boolean

useNullableQuerySchema()

If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ...

Methods inherited from interface org.apache.spark.sql.connector.catalog.CatalogPlugin
defaultNamespace, initialize, name

Field Details
- PROP_LOCATION
  
  static final String PROP_LOCATION
  
  A reserved property to specify the location of the table. The files of the table should be under this location. The location is a Hadoop Path string.
  See Also:
  
  Constant Field Values
- PROP_IS_MANAGED_LOCATION
  
  static final String PROP_IS_MANAGED_LOCATION
  
  A reserved property to indicate that the table location is managed, not user-specified. If this property is "true", it means it's a managed table even if it has a location. As an example, SHOW CREATE TABLE will not generate the LOCATION clause.
  See Also:
  
  Constant Field Values
- PROP_EXTERNAL
  
  static final String PROP_EXTERNAL
  
  A reserved property to specify a table was created with EXTERNAL.
  See Also:
  
  Constant Field Values
- PROP_TABLE_TYPE
  
  static final String PROP_TABLE_TYPE
  
  A reserved property that indicates table entity type (external, managed, view, etc.).
  See Also:
  
  Constant Field Values
- PROP_COMMENT
  
  static final String PROP_COMMENT
  
  A reserved property to specify the description of the table.
  See Also:
  
  Constant Field Values
- PROP_COLLATION
  
  static final String PROP_COLLATION
  
  A reserved property to specify the collation of the table.
  See Also:
  
  Constant Field Values
- PROP_PROVIDER
  
  static final String PROP_PROVIDER
  
  A reserved property to specify the provider of the table.
  See Also:
  
  Constant Field Values
- PROP_OWNER
  
  static final String PROP_OWNER
  
  A reserved property to specify the owner of the table.
  See Also:
  
  Constant Field Values
- OPTION_PREFIX
  
  static final String OPTION_PREFIX
  
  A prefix used to pass OPTIONS in table properties
  See Also:
  
  Constant Field Values
Method Details
- capabilities
  
  default Set<TableCatalogCapability> capabilities()
  
  Returns:
  
  the set of capabilities for this TableCatalog
- listTables
  
  Identifier[] listTables(String[] namespace) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  List the tables in a namespace from the catalog.
  If the catalog supports views, this must return identifiers for only tables and not views.
  
  Parameters:
  
  namespace - a multi-part namespace
  
  Returns:
  
  an array of Identifiers for tables
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the namespace does not exist (optional).
- listTableSummaries
  
  default TableSummary[] listTableSummaries(String[] namespace) throws org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  List the table summaries in a namespace from the catalog.
  This method should return all tables entities from a catalog regardless of type (i.e. views should be listed as well).
  
  Parameters:
  
  namespace - a multi-part namespace
  
  Returns:
  
  an array of Identifiers for tables
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the namespace does not exist (optional).
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If certain table listed by listTables API does not exist.
- loadTable
  
  Table loadTable(Identifier ident) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Load table metadata by identifier from the catalog.
  If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.
  
  Parameters:
  
  ident - a table identifier
  
  Returns:
  
  the table's metadata
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist or is a view
- loadTable
  
  default Table loadTable(Identifier ident, Set<TableWritePrivilege> writePrivileges) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Load table metadata by identifier from the catalog. Spark will write data into this table later.
  If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.
  
  Parameters:
  
  ident - a table identifier
  
  writePrivileges -
  
  Returns:
  
  the table's metadata
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist or is a view
  
  Since:
  
  3.5.3
- loadTable
  
  default Table loadTable(Identifier ident, String version) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Load table metadata of a specific version by identifier from the catalog.
  If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.
  
  Parameters:
  
  ident - a table identifier
  
  version - version of the table
  
  Returns:
  
  the table's metadata
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist or is a view
- loadTable
  
  default Table loadTable(Identifier ident, long timestamp) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Load table metadata at a specific time by identifier from the catalog.
  If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.
  
  Parameters:
  
  ident - a table identifier
  
  timestamp - timestamp of the table, which is microseconds since 1970-01-01 00:00:00 UTC
  
  Returns:
  
  the table's metadata
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist or is a view
- loadChangelog
  
  default Changelog loadChangelog(Identifier ident, ChangelogInfo changelogInfo) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Load a Changelog for the given table, representing the row-level changes within the range specified by changelogInfo.
  The default implementation throws an analysis exception indicating that the catalog does not support CDC. Catalogs that support CDC must override this method.
  
  Parameters:
  
  ident - a table identifier
  
  changelogInfo - the CDC query parameters (range, deduplication mode, etc.)
  
  Returns:
  
  a Changelog instance for the requested table and range
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist
  
  Since:
  
  4.2.0
- invalidateTable
  
  default void invalidateTable(Identifier ident)
  
  Invalidate cached table metadata for an identifier.
  If the table is already loaded or cached, drop cached data. If the table does not exist or is not cached, do nothing. Calling this method should not query remote services.
  
  Parameters:
  
  ident - a table identifier
- tableExists
  
  default boolean tableExists(Identifier ident)
  
  Test whether a table exists using an identifier from the catalog.
  If the catalog supports views and contains a view for the identifier and not a table, this must return false.
  
  Parameters:
  
  ident - a table identifier
  
  Returns:
  
  true if the table exists, false otherwise
- createTable
  
  @Deprecated(since="3.4.0") default Table createTable(Identifier ident, StructType schema, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Deprecated.
  This is deprecated. Please override createTable(Identifier, Column[], Transform[], Map) instead.
  
  Create a table in the catalog.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- createTable
  
  @Deprecated(since="4.1.0") default Table createTable(Identifier ident, Column[] columns, Transform[] partitions, Map<String,String> properties) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Deprecated.
  This is deprecated. Please override createTable(Identifier, TableInfo) instead.
  
  Create a table in the catalog.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
- createTable
  
  default Table createTable(Identifier ident, TableInfo tableInfo) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Create a table in the catalog.
  
  Parameters:
  
  ident - a table identifier
  
  tableInfo - information about the table.
  
  Returns:
  
  metadata for the new table. This can be null if getting the metadata for the new table is expensive. Spark will call loadTable(Identifier) if needed (e.g. CTAS).
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException - If a table or view already exists for the identifier
  
  UnsupportedOperationException - If a requested partition transform is not supported
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
  
  Since:
  
  4.1.0
- createTableLike
  
  default Table createTableLike(Identifier ident, TableInfo tableInfo, Table sourceTable) throws org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException, org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException
  
  Create a table in the catalog by copying metadata from an existing source table.
  This method is called for CREATE TABLE ... LIKE ... statements targeting this catalog. The tableInfo parameter contains all the explicit information for the new table: columns and partitioning copied from the source, any constraints copied from the source, user-specified TBLPROPERTIES / LOCATION / USING provider (if given), and PROP_OWNER set to the current user. Source table properties are intentionally excluded from tableInfo; connectors may read sourceTable.properties() to clone additional format-specific or custom state as appropriate for their implementation.
  The default implementation throws UnsupportedOperationException. Connectors that support CREATE TABLE ... LIKE ... must override this method.
  
  Parameters:
  
  ident - a table identifier for the new table
  
  tableInfo - complete description of the new table: columns, partitioning, constraints, explicit properties (user overrides + owner); source table properties are NOT included
  
  sourceTable - the resolved source table; connectors may read format-specific properties or other custom state from this object to clone additional metadata
  
  Returns:
  
  metadata for the new table
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException - If a table or view already exists for the identifier
  
  org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException - If the identifier namespace does not exist (optional)
  
  UnsupportedOperationException - If the catalog does not support CREATE TABLE LIKE
  
  Since:
  
  4.2.0
- useNullableQuerySchema
  
  default boolean useNullableQuerySchema()
  
  If true, mark all the fields of the query schema as nullable when executing CREATE/REPLACE TABLE ... AS SELECT ... and creating the table.
- alterTable
  
  Table alterTable(Identifier ident, TableChange... changes) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException
  
  Apply a set of changes to a table in the catalog.
  Implementations may reject the requested changes. If any change is rejected, none of the changes should be applied to the table.
  The requested changes must be applied in the order given.
  If the catalog supports views and contains a view for the identifier and not a table, this must throw NoSuchTableException.
  
  Parameters:
  
  ident - a table identifier
  
  changes - changes to apply to the table
  
  Returns:
  
  updated metadata for the table. This can be null if getting the metadata for the updated table is expensive. Spark always discard the returned table here.
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table doesn't exist or is a view
  
  IllegalArgumentException - If any change is rejected by the implementation.
- dropTable
  
  boolean dropTable(Identifier ident)
  
  Drop a table in the catalog.
  If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
  
  Parameters:
  
  ident - a table identifier
  
  Returns:
  
  true if a table was deleted, false if no table exists for the identifier
- purgeTable
  
  default boolean purgeTable(Identifier ident) throws UnsupportedOperationException
  
  Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.
  If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
  If the catalog supports to purge a table, this method should be overridden. The default implementation throws UnsupportedOperationException.
  
  Parameters:
  
  ident - a table identifier
  
  Returns:
  
  true if a table was deleted, false if no table exists for the identifier
  
  Throws:
  
  UnsupportedOperationException - If table purging is not supported
  
  Since:
  
  3.1.0
- renameTable
  
  void renameTable(Identifier oldIdent, Identifier newIdent) throws org.apache.spark.sql.catalyst.analysis.NoSuchTableException, org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
  
  Renames a table in the catalog.
  If the catalog supports views and contains a view for the old identifier and not a table, this throws NoSuchTableException. Additionally, if the new identifier is a table or a view, this throws TableAlreadyExistsException.
  If the catalog does not support table renames between namespaces, it throws UnsupportedOperationException.
  
  Parameters:
  
  oldIdent - the table identifier of the existing table to rename
  
  newIdent - the new table identifier of the table
  
  Throws:
  
  org.apache.spark.sql.catalyst.analysis.NoSuchTableException - If the table to rename doesn't exist or is a view
  
  org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException - If the new table name already exists or is a view
  
  UnsupportedOperationException - If the namespaces of old and new identifiers do not match (optional)

Interface TableCatalog

Field Summary

Method Summary

Methods inherited from interface org.apache.spark.sql.connector.catalog.CatalogPlugin

Field Details

PROP_LOCATION

PROP_IS_MANAGED_LOCATION

PROP_EXTERNAL

PROP_TABLE_TYPE

PROP_COMMENT

PROP_COLLATION

PROP_PROVIDER

PROP_OWNER

OPTION_PREFIX

Method Details

capabilities

listTables

listTableSummaries

loadTable

loadTable

loadTable

loadTable

loadChangelog

invalidateTable

tableExists

createTable

createTable

createTable

createTableLike

useNullableQuerySchema

alterTable

dropTable

purgeTable

renameTable