Export to Kafka
Tacnode DataSync supports exporting CDC (Change Data Capture) events to downstream Kafka clusters, enabling seamless integration with downstream systems and facilitating real-time data processing workflows.
Export Formats
Data samples in this guide are based on the following example table:
KV JSON Format
Outputs only the updated data, ideal for frontend state display and state synchronization scenarios.
Example Output:
Use Cases:
- Frontend state synchronization
- Real-time dashboard updates
- Simple event streaming
- Microservice state propagation
Maxwell JSON Format
Maxwell is a CDC JSON protocol that completely records all INSERT
/UPDATE
/DELETE
operations.
Field Descriptions
Field | Description |
---|---|
database | Database where the change operation occurred |
table | Table where the change operation occurred |
type | Operation type (insert/update/delete) |
ts | Change timestamp (Unix timestamp) |
xid | Change transaction ID |
commit | Whether the transaction is committed |
data | Row data after the change |
old | Previous row data (for updates) |
Data Examples
INSERT Operation:
UPDATE Operation:
DELETE Operation:
Export Job Configuration
1. Create Export Task
- Log into the Tacnode console
- Navigate to "Data Sync" → "Data Export" → "Data Export Jobs"
2. Select Target Type
When creating a task, select Kafka as the output target:
3. Configure Connection Information
Source Configuration
- Select the Tacnode instance you want to sync from on the left side
- Click "Test Connection" to test the connection
- A successful test will display "Connection Successful"
Sink Configuration (Kafka)
-
Fill in the target Kafka connection information on the right side
-
Network Configuration:
- First resolve network connectivity between DataSync and target Kafka
- Bootstrap Server: Enter Kafka server address and port
- For CDC push over public network, modify Kafka's
advertised.listeners
property to avoid connection issues
-
Security Protocol: Supports "PLAINTEXT", "SASL_PLAINTEXT", "SASL_SSL" - choose based on your requirements
PLAINTEXT
: Plain text transmission, suitable for internal networksSASL_PLAINTEXT
: SASL authentication + plain text transmission, suitable for scenarios requiring authentication but not encryptionSASL_SSL
: SASL authentication + SSL encrypted transmission, suitable for public networks or high-security requirements
-
Click "Test Connection" after configuration
-
A successful test will display "Connection Successful"
Example Configuration:
4. Select Sync Objects
Choose the schemas and tables you want to synchronize:
Optional: Sync Materialized Views (MV)
If you need to sync materialized views, check the Mview category in the configuration.
Object Selection Examples:
5. Output Configuration
Sync Mode
Three sync mode options are available:
-
FULL: Reads the current data snapshot of the entire table, writes to Kafka, then exits the task. Suitable for one-time data migration scenarios.
-
INCREMENTAL: Only reads incremental CDC events and continuously writes to Kafka. Suitable for real-time data sync scenarios.
-
FULL + INCREMENTAL: First reads the current data snapshot of the table, then continuously pushes all CDC JSON starting from the snapshot point. Suitable for scenarios requiring both historical data and real-time sync.
Mode Selection Guidelines:
Other Configuration Items
- Topic: Kafka topic name to write to (must be pre-created)
- Zone ID: Task timezone, defaults to UTC
Topic Management:
6. Advanced Parameter Configuration
Timezone Handling
Zone ID is a JDK-based timezone handling parameter. Since the Maxwell format doesn't have a timestamptz
type, PostgreSQL's timestamptz
to Maxwell conversion will format according to the configured timezone and then remove timezone information. Other fields follow Maxwell standards and remain consistent with Maxwell tools.
Valid Zone ID values reference Java timezone identifiers, commonly used ones include:
UTC
: Coordinated Universal Time (default)Asia/Shanghai
: China Standard TimeAmerica/New_York
: US Eastern TimeEurope/London
: UK TimeJapan
: Japan Time
Example Timezone Configuration:
Other Parameter Descriptions
Parameter | Description | Default |
---|---|---|
output_binlog_position | Whether to output binlog position information (PG position) | false |
output_server_id | Whether to output server ID (mock implementation) | false |
output_thread_id | Whether to output thread ID (mock implementation) | false |
output_schema_id | Schema change incremental ID | false |
output_primary_keys | Whether to output primary key values | true |
output_primary_key_columns | Whether to output primary key column names | false |
output_push_timestamp | Whether to output send timestamp | false |
kafka_key_format | Kafka key generation method | "primary_key" |
producer_partition_by | Partitioning strategy | "key" |
producer_partition_columns | Partition column names (for column-based partitioning) | null |
producer_partition_by_fallback | Fallback method when partitioning strategy fails | "random" |
kafka_partition_hash | Partition hash algorithm | "murmur2" |
Advanced Configuration Example:
Monitoring and Troubleshooting
Export Job Monitoring
Common Issues and Solutions
Connection Issues:
Performance Optimization:
Best Practices
-
Topic Design:
- Use appropriate partition count based on throughput requirements
- Set adequate replication factor for durability
- Configure retention policies based on downstream consumption patterns
-
Security Configuration:
- Use SSL/TLS for data in transit
- Implement proper authentication and authorization
- Regularly rotate credentials
-
Performance Optimization:
- Tune batch sizes based on latency requirements
- Monitor and adjust partition assignments
- Use compression for large messages
-
Error Handling:
- Implement dead letter queues for failed messages
- Set up alerting for export job failures
- Configure retry policies for transient failures
This comprehensive configuration enables robust, scalable CDC event streaming from TacNode to Kafka, supporting diverse real-time data processing and integration scenarios.