Data Sync Source
Comprehensive guide to selecting and configuring data sources for Tacnode DataSync. Learn about supported databases, cloud services, and best practices for seamless data integration.
Selecting the appropriate data source is fundamental to successful data synchronization. The right choice ensures data integrity, optimizes sync performance, reduces integration complexity, and maintains high data quality standards.
Tacnode DataSync supports a comprehensive range of data sources, covering mainstream relational databases, big data messaging queues, NoSQL databases, and specialized cloud services.
Importance of Data Source Selection
Choosing the right data source enables:
- Data Integrity Assurance: Select sources that support required data formats and protocols
- Performance Optimization: Choose appropriate sync strategies based on source characteristics
- Reduced Integration Complexity: Use supported data sources to minimize development overhead
- Quality Guarantee: Select stable and reliable sources to ensure sync quality
Supported Data Sources
Input Data Sources
Tacnode DataSync can acquire and transmit data from the following sources:
| Data Source | Sync Capabilities | Primary Use Cases |
|---|---|---|
| Tacnode | Efficient full and incremental data sync | Tacnode instance-to-instance migration |
| PostgreSQL | Efficient full and incremental data sync | PostgreSQL database synchronization |
| MySQL | Efficient full and incremental data sync | MySQL database synchronization |
| Oracle | Efficient full and incremental data sync | Oracle database synchronization |
| Kafka | Real-time event sync, supports KVS JSON, DOUBLE SERIALIZED KVS JSON, CANAL JSON, CANAL PROTOBUF formats | Real-time data stream processing |
| MongoDB | Efficient full and incremental data sync | MongoDB database synchronization |
| Other Relational Databases | Support for protocol-compatible databases | Protocol-compatible database systems |
Output Data Sources
Currently, Tacnode DataSync supports exporting data to:
| Data Source | Export Capabilities | Configuration Requirements |
|---|---|---|
| Kafka | Export Tacnode database change events to Kafka topics, supports Maxwell and KVS formats | Kafka cluster connection info and topic configuration |
Permission Configuration
Different data sources require specific permissions to ensure DataSync can properly access and synchronize data.
MySQL Permissions
For MySQL data sources, different sync types require different permissions:
Full Sync Permissions
GRANT SELECT, SHOW DATABASES ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;
Incremental Sync Permissions
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;
Binlog Configuration Requirements
Incremental sync requires proper MySQL binlog configuration:
Required Settings:
server_id: Non-emptylog_bin: 1 (enabled)binlog_format: ROWbinlog_row_image: FULL
Verification Query:
-- Check binlog configuration
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'server_id';
PostgreSQL Permissions
For PostgreSQL data sources, permission requirements vary by sync type:
Full Sync Permissions
-- Grant SELECT permissions on all tables in specified schemas
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};
Incremental Sync Permissions
-- Grant REPLICATION privilege to existing user
ALTER USER ${user} WITH REPLICATION;
-- Grant necessary permissions
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};
Logical Replication Configuration Requirements
Incremental sync requires proper PostgreSQL logical replication configuration:
Required Settings:
wal_level = logical
Verification Query:
-- Check WAL level
SHOW wal_level;
-- Check replication slots
SELECT * FROM pg_replication_slots;
Oracle Permissions
For Oracle data sources, grant the following permissions to ensure DataSync can properly access and sync data:
-- Basic query permissions
GRANT SELECT ANY TABLE TO ${user};
GRANT SELECT ANY DICTIONARY TO ${user};
-- Additional permissions for incremental sync
GRANT EXECUTE ON DBMS_LOGMNR TO ${user};
GRANT SELECT ON V_$LOGMNR_CONTENTS TO ${user};
GRANT SELECT ON V_$ARCHIVED_LOG TO ${user};
GRANT SELECT ON V_$LOG TO ${user};
Kafka Permissions
For Kafka data sources, ensure DataSync has the following permissions for data import or export:
Required Access:
- Read permissions on specified topics (input source)
- Write permissions on specified topics (output source)
- Access permissions to Kafka cluster metadata
Configuration Example:
# Kafka connection configuration
bootstrap.servers=kafka-cluster:9092
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="${username}" password="${password}";
MongoDB Permissions
For MongoDB data sources, grant the following permissions to ensure DataSync can properly access and sync data:
// Full sync permissions
{
role: "read",
db: "${database}",
privileges: [],
roles: []
}
// Incremental sync permissions
{
role: "read",
db: "${database}",
privileges: [],
roles: []
}
// Plus oplog read permissions
Configuration Guide
Configuring DataSync to connect to data sources typically involves the following steps:
Obtaining Access Credentials
Prepare the target data source’s address, port, username, password, database name (or Topic name, Project name, etc.), and necessary security credentials (such as Access Key/Secret Key, SSL certificates, etc.).
Security Recommendations:
- Use dedicated sync accounts, avoid using administrator accounts
- Regularly rotate access credentials
- Enable SSL/TLS encrypted transmission
Choosing Connection Method
Direct Connection
Using direct network connection requires ensuring the firewall has added DataSync service IP addresses to the whitelist in advance.
Configuration Points:
- Ensure network reachability
- Configure appropriate security group rules
- Test network connectivity
# Test network connectivity
telnet <data_source_host> <port>
# Test database connectivity
psql -h <host> -p <port> -U <username> -d <database>
mysql -h <host> -P <port> -u <username> -p <database>
Tunnel Connection
Through pre-established PrivateLink connections for more secure data transmission.
Configuration Points:
- Configure VPC endpoint
- Set up private network connection
- Verify connection security
Connection Testing
After saving configuration, first perform connection testing to ensure DataSync can properly access the data source.
Testing Steps:
- Click “Test Connection” button
- Wait for test results
- Adjust configuration based on test results
- Retest until connection succeeds
Common Connection Issues:
-- Network connectivity test
SELECT 1;
-- Permission verification
SHOW GRANTS FOR CURRENT_USER;
-- Database accessibility test
SELECT current_database(), current_user;
Best Practices
Permission Configuration
- Principle of Least Privilege: Grant only the minimum permissions required for sync
- Dedicated Accounts: Create dedicated accounts for data synchronization
- Permission Auditing: Regularly review account permission configurations
-- Example: Create dedicated sync user for MySQL
CREATE USER 'datasync_user'@'%' IDENTIFIED BY 'strong_password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'datasync_user'@'%';
FLUSH PRIVILEGES;
-- Verify permissions
SHOW GRANTS FOR 'datasync_user'@'%';
Security Configuration
- Network Isolation: Use private network connections to avoid public network transmission
- Encrypted Transmission: Enable SSL/TLS to encrypt data transmission
- Access Control: Configure IP whitelists to restrict access
# Example: Secure connection configuration
datasync_config:
security:
ssl_enabled: true
ssl_cert_path: "/path/to/cert.pem"
ssl_key_path: "/path/to/key.pem"
whitelist_ips:
- "10.0.0.0/8"
- "192.168.0.0/16"
Performance Optimization
- Connection Pooling: Properly configure connection pool size
- Batch Processing: Enable batch data processing to improve efficiency
- Concurrency Control: Adjust concurrency based on source system performance
{
"connection_pool": {
"max_connections": 10,
"min_connections": 2,
"connection_timeout": 30
},
"batch_processing": {
"batch_size": 1000,
"max_batch_interval": 5000
},
"concurrency": {
"max_workers": 4,
"queue_size": 1000
}
}
Monitoring and Alerting
- Connection Status: Monitor data source connection status
- Performance Metrics: Monitor sync performance indicators
- Exception Alerting: Set up connection exception alerts
-- Monitor sync job status
SELECT
job_id,
source_type,
connection_status,
last_sync_time,
records_processed,
error_count
FROM datasync_job_status
WHERE source_type = 'mysql'
ORDER BY last_sync_time DESC;
Monitoring Dashboard Example:
- Connection health status
- Sync throughput (records/second)
- Error rates and types
- Resource utilization metrics
Through proper selection and configuration of data sources, you can ensure the stability, security, and efficiency of data synchronization, providing reliable data support for business operations.
Advanced Configuration Examples
High Availability Setup
# Multi-source failover configuration
datasync:
primary_source:
type: "mysql"
host: "primary-db.example.com"
port: 3306
failover_sources:
- type: "mysql"
host: "secondary-db.example.com"
port: 3306
- type: "mysql"
host: "tertiary-db.example.com"
port: 3306
failover_strategy: "automatic"
health_check_interval: 30
Multi-Region Data Sources
{
"regions": {
"us-east-1": {
"mysql_primary": "mysql-us-east.example.com:3306",
"kafka_cluster": "kafka-us-east.example.com:9092"
},
"eu-west-1": {
"mysql_primary": "mysql-eu-west.example.com:3306",
"kafka_cluster": "kafka-eu-west.example.com:9092"
}
},
"cross_region_sync": true,
"compression_enabled": true
}
This comprehensive approach to data source selection and configuration ensures optimal performance and reliability for your Tacnode DataSync operations.