Change Data Capture (CDC)
Tacnode offers comprehensive logical replication capabilities, supporting data change capture via Change Data Capture (CDC) mechanisms. This documentation focuses on how to efficiently capture data changes using the core concepts of publications and replication slots.
Core Concepts
- Publication
Defines the tables and change types (INSERT, UPDATE, DELETE) to be replicated from the database. Supports creating publications for specific tables or for all tables.
- Replication Slot
Server-side mechanism that tracks the consumer’s replication progress, ensuring that Write-Ahead Log (WAL) entries are not deleted before consumers acknowledge receipt of changes. Guarantees at-least-once delivery for consumers.
Other Concepts
- Logical Decoding
Converts database Write-Ahead Log (WAL), typically stored in an internal format, into a client-readable format in real time as data is written to the database.
- Decoding Plugins
Determine the output format of the data stream. Tacnode supports two output plugins: test_decoding and pgoutput. test_decoding outputs a SQL-like textual format intended for testing and validation. pgoutput provides a more performant binary format that complies with the PostgreSQL logical replication protocol; decoding is required on the client side. Flink CDC uses the pgoutput format.
For more information on encoding protocols, see Logical Replication Message Formats.
Prerequisites
To enable the CDC feature at the database level, set WAL_LEVEL
to logical (higher than the default replica level). Changing WAL_LEVEL
requires re-establishing the connection for the change to take effect.
For scenarios where CDC consumption is based on the Debezium protocol, such as with the Apache Flink CDC Connector, the table’s REPLICA IDENTIFY
attribute should be set to FULL
. The REPLICA IDENTIFY
attribute determines how CDC tracks and identifies data changes.
REPLICA IDENTITY FULL
ensures that CDC captures the full "before image" of each row, which is especially important in the following cases:
- Tables without a primary key
- Requirement to capture complete data for UPDATE and DELETE operations
- Need to guarantee change events include values of all columns
If FULL
is not set, CDC only includes the primary key or limited information for UPDATE and DELETE operations. Setting REPLICA IDENTITY FULL
increases the size of WAL logs and may impact performance for large tables.
Publication
Publication is the core mechanism for table change subscription in Tacnode. Use Publication to define a set of table changes, which Subscriptions can then subscribe to for real-time data synchronization and distribution.
Create Publication
Create a Publication
Create a Publication for specific operation types
Alter Publication
Add tables to a Publication
Remove tables from a Publication
Change the operations published
Check Publication
List all Publications
View details of a specific Publication
Drop Publication
Delete a Publication within the database where it was created.
Replication Slot
Create Slot
Update Slot
In certain scenarios, you may need to manually advance the replication slot position (LSN):
Check Slot
Drop Slot
Slots can be removed from the database where they were created.
It is recommended to check the slot status before dropping.
Consume CDC by Apache Flink
Apache Flink supports consuming CDC events using the official postgres-cdc connector. It can subscribe to a specified publication and, during checkpoint execution, confirm and update the LSN within the slot. For detailed usage, see Apache Flink Integration.
For more configuration options of the postgres-cdc connector, refer to the Postgres CDC Connector documentation.
Typical Example
Create test tables:
Create a publication:
Create a replication slot:
For Flink CDC job configuration, refer to Apache Flink Integration.
Monitor slot status:
Handle slot stalling. If the slot is found to be inactive (active=false
) and not in use:
When jobs complete or require reconfiguration, follow these steps to clean up resources:
-
Stop the Flink job. Ensure the Flink job has fully stopped and no longer uses the CDC connection.
-
Delete the replication slot:
- Delete the publication:
pg_logical functions.
Validate CDC event inside the database using pg_logical functions and the test_decoding plugin.
Example:
Streaming Replication Protocol
In addition to using the functions demonstrated in the previous examples to test streaming replication data export, you can adopt a more effective approach by employing a distinct streaming replication protocol.
Typically, PostgreSQL JDBC drivers for various programming languages offer respective encapsulations. Alternatively, you can connect using the standard method and manage the protocol manually. For details regarding the protocol, please see Streaming Replication Protocol