Apache Flink
Tacnode is a high-performance, distributed database compatible with the PostgreSQL protocol. Supports standard SQL syntax and the PostgreSQL ecosystem toolchain. This article explains how to efficiently write data to Tacnode using Apache Flink, focusing on two approaches: the DataStream API and Flink SQL. It also covers how to read CDC (Change Data Capture) incremental changes via Apache Flink.
Environment Preparation
Version Compatibility
- Flink version: 1.14+ (1.16+ recommended)
- JDBC driver: PostgreSQL JDBC driver (42.5+ recommended)
Dependency Configuration
Add the PostgreSQL JDBC driver dependency to your Flink project:
Flink SQL to Tacnode Type Mapping
Flink SQL Type | Tacnode Type |
---|---|
BOOLEAN | BOOLEAN |
TINYINT | SMALLINT |
SMALLINT | SMALLINT |
INT | INTEGER |
BIGINT | BIGINT |
FLOAT | REAL |
DOUBLE | DOUBLE PRECISION |
DECIMAL(p,s) | NUMERIC(p,s) |
VARCHAR(n) | VARCHAR(n) |
CHAR(n) | CHAR(n) |
DATE | DATE |
TIME | TIME |
TIMESTAMP | TIMESTAMP |
ARRAY | ARRAY |
MAP | JSONB |
ROW | JSONB |
Writing with the DataStream API
Using the JDBC Sink
Batch Write Optimization
Writing with Flink SQL
Registering the Tacnode Catalog
Inserting Data via Flink SQL
Alternatively, submit via the Flink SQL client or web UI:
Common Configuration Parameters
Parameter Name | Description | Recommended Value |
---|---|---|
sink.buffer-flush.max-rows | Max number of rows per batch | 1000-5000 |
sink.buffer-flush.interval | Batch flush interval | 1s |
sink.max-retries | Max retry attempts | 3 |
sink.parallelism | Sink parallelism | Number of Tacnode nodes |
connection.max-retry-timeout | Connection timeout | 30s |
For more configuration options, see: Apache Flink JDBC SQL Connector.
Consume CDC By postgres-cdc connector
Flink retrieves CDC data from Tacnode using the postgres-cdc connector.
Preconditions
- Database must have logical replication enabled (
WAL_LEVEL=logical
) - Flink runtime and Tacnode must be network-accessible to each other
- Database user requires Replication privilege, and permission to create replication slots and publications if needed
See change-data-capture for details.
Publication & Slot Resource Management
Create publications in advance. Publications, as defined by the PostgreSQL protocol, manage server-side event publishing and filtering; specific tables or events can be selected for publishing, and partitioned table events can be aggregated to parent tables. If no explicit publication is created, the Flink CDC connector will automatically generate a default dbz_publication, which subscribes to all tables and change events. This default leads to local filtering, lower efficiency, and higher network overhead. Specify the publication in the Flink connector by setting 'debezium.publication.name' = 'pub-name'
. Multiple Flink jobs can share the same publication.
Slots track CDC consumer progress (current LSN). Each consumer should use a unique slot to maintain an independent consumption state. It is recommended to pre-create slots with the noexport_snapshot
parameter for efficient storage usage. Each Flink job should use a distinct slot. Example:
Manually clean up unused slots after stopping a Flink consumer job to release pinned storage:
Slot state is updated when Flink checkpoints are triggered; each checkpoint updates the slot's LSN. Unchecked slots will keep old WAL segments pinned, causing wasted storage. CDC consumption can resume only from the latest completed checkpoint—historical checkpoints cannot be selected. Each checkpoint tracks the LSN confirmed for consumption. Once confirmed, slots advance, and older LSNs are no longer available. Regardless of rollback attempts, jobs always resume from the most recent successful checkpoint.
Flink CDC Source Example
The following example demonstrates reading from a Source table using CDC, then writing results into a Sink table.
Monitoring and Operations
Monitoring SQL Jobs
Monitor with Flink Web UI:
- Write throughput (records/s)
- Operator backpressure status
- Checkpoint state
Logging Configuration
Troubleshooting
Type Mapping Issues
Symptom: Write failures because of field type mismatch.
Resolution:
- Explicitly specify type mapping in DDL:
- Use CAST to convert types:
Performance Bottlenecks
Symptom: Low SQL job throughput.
Resolution:
- Increase parallelism
- Tune batch parameters
- Check Tacnode cluster load
- Optimize SQL query logic