Packages

trait Changelog extends AnyRef

The central connector interface for Change Data Capture (CDC).

Connectors implement this minimal interface to expose change data. Spark handles post-processing (carry-over removal, update detection, net change computation) based on the properties declared by the connector.

The columns returned by #columns() must include the following metadata columns:

  • _change_type (STRING) — the kind of change: insert, delete, update_preimage, or update_postimage
  • _commit_version (connector-defined type, e.g. LONG) — the version containing this change
  • _commit_timestamp (TIMESTAMP) — the timestamp of the commit
Annotations
@Evolving()
Source
Changelog.java
Since

4.2.0

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Changelog
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Abstract Value Members

  1. abstract def columns(): Array[Column]

    Returns the columns of this changelog, including data columns and the required metadata columns (_change_type, _commit_version, _commit_timestamp).

  2. abstract def containsCarryoverRows(): Boolean

    Whether the raw change data may contain identical insert/delete carry-over pairs produced by copy-on-write file rewrites.

    Whether the raw change data may contain identical insert/delete carry-over pairs produced by copy-on-write file rewrites.

    When true and the CDC query's deduplicationMode is not none, Spark will remove carry-over pairs from the raw change data. If false, the connector guarantees that no carry-over pairs are present in the raw change data and Spark will skip carry-over removal entirely.

  3. abstract def containsIntermediateChanges(): Boolean

    Whether the raw change data may contain multiple intermediate states per row identity within the requested changelog range (across all commit versions in the range).

    Whether the raw change data may contain multiple intermediate states per row identity within the requested changelog range (across all commit versions in the range).

    When true and the CDC query's deduplicationMode is netChanges, Spark will collapse multiple changes per row identity into the net effect. If false, the connector guarantees at most one change per row identity across the entire changelog range, and Spark will skip net change computation.

  4. abstract def name(): String

    A name to identify this changelog.

  5. abstract def newScanBuilder(options: CaseInsensitiveStringMap): ScanBuilder

    Returns a new ScanBuilder for reading the change data.

    Returns a new ScanBuilder for reading the change data.

    options

    read options (case-insensitive string map)

  6. abstract def representsUpdateAsDeleteAndInsert(): Boolean

    Whether updates in the raw change data are represented as delete+insert pairs rather than fully materialized update_preimage and update_postimage entries.

    Whether updates in the raw change data are represented as delete+insert pairs rather than fully materialized update_preimage and update_postimage entries.

    When true and the CDC query's computeUpdates option is enabled, Spark will derive update_preimage/update_postimage from insert/delete pairs in the raw change data. If false, the connector guarantees that update pre/post-images are already present in the raw change data.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  14. def rowId(): Array[NamedReference]

    Returns the columns that uniquely identify a row, used for carry-over removal, update detection, and net change computation.

    Returns the columns that uniquely identify a row, used for carry-over removal, update detection, and net change computation.

    The default implementation throws UnsupportedOperationException. Connectors must override this method when any of #containsCarryoverRows(), #representsUpdateAsDeleteAndInsert(), or #containsIntermediateChanges() returns true. Each referenced column must be non-nullable.

  15. def rowVersion(): NamedReference

    Returns the column that holds the row version — the commit version at which the row's content was last modified.

    Returns the column that holds the row version — the commit version at which the row's content was last modified. The row version has these properties:

    • Assigned the current commit version when the row is initially inserted.
    • Bumped to the current commit version when the row's content is updated.
    • Preserved when the row is rewritten by a copy-on-write operation without a content change — it is NOT bumped to the current commit version.

    The row version is distinct from _commit_version. _commit_version identifies the commit that emitted this change row; the row version identifies the commit that last wrote the row's content. For a delete+insert pair produced within a single commit, both halves share the same row version if the pair is a copy-on-write carry-over, and have different row versions (old on the delete, new on the insert) if the pair is a true update.

    Spark uses the row version to distinguish copy-on-write carry-over from update without scanning data columns, for both carry-over removal and update detection.

    The default implementation throws UnsupportedOperationException. Connectors must override this method when #containsCarryoverRows() or #representsUpdateAsDeleteAndInsert() returns true. The referenced column must be non-nullable.

  16. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped