GuidesData Sync

Data Sync Source

Selecting the appropriate data source is fundamental to successful data synchronization. The right choice ensures data integrity, optimizes sync performance, reduces integration complexity, and maintains high data quality standards.

Tacnode DataSync supports a comprehensive range of data sources, covering mainstream relational databases, big data messaging queues, NoSQL databases, and specialized cloud services.

Importance of Data Source Selection

Choosing the right data source enables:

  • Data Integrity Assurance: Select sources that support required data formats and protocols
  • Performance Optimization: Choose appropriate sync strategies based on source characteristics
  • Reduced Integration Complexity: Use supported data sources to minimize development overhead
  • Quality Guarantee: Select stable and reliable sources to ensure sync quality

Supported Data Sources

Input Data Sources

Tacnode DataSync can acquire and transmit data from the following sources:

Data SourceSync CapabilitiesPrimary Use Cases
TacnodeEfficient full and incremental data syncTacnode instance-to-instance migration
PostgreSQLEfficient full and incremental data syncPostgreSQL database synchronization
MySQLEfficient full and incremental data syncMySQL database synchronization
OracleEfficient full and incremental data syncOracle database synchronization
KafkaReal-time event sync, supports KVS JSON, DOUBLE SERIALIZED KVS JSON, CANAL JSON, CANAL PROTOBUF formatsReal-time data stream processing
MongoDBEfficient full and incremental data syncMongoDB database synchronization
Alibaba Cloud ADB (AnalyticDB)Efficient full syncAlibaba Cloud analytical database
Alibaba Cloud Data HubReal-time event syncAlibaba Cloud real-time data processing
Alibaba Cloud SLS (Simple Log Service)Real-time event syncAlibaba Cloud log service
Other Relational DatabasesSupport for protocol-compatible databasesProtocol-compatible database systems

Output Data Sources

Currently, Tacnode DataSync supports exporting data to:

Data SourceExport CapabilitiesConfiguration Requirements
KafkaExport Tacnode database change events to Kafka topics, supports Maxwell and KVS formatsKafka cluster connection info and topic configuration

Permission Configuration

Different data sources require specific permissions to ensure DataSync can properly access and synchronize data.

MySQL Permissions

For MySQL data sources, different sync types require different permissions:

Full Sync Permissions

GRANT SELECT, SHOW DATABASES ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;

Incremental Sync Permissions

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;

Binlog Configuration Requirements

Incremental sync requires proper MySQL binlog configuration:

Required Settings:

  • server_id: Non-empty
  • log_bin: 1 (enabled)
  • binlog_format: ROW
  • binlog_row_image: FULL

Verification Query:

-- Check binlog configuration
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'server_id';

PostgreSQL Permissions

For PostgreSQL data sources, permission requirements vary by sync type:

Full Sync Permissions

-- Grant SELECT permissions on all tables in specified schemas
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};

Incremental Sync Permissions

-- Grant REPLICATION privilege to existing user
ALTER USER ${user} WITH REPLICATION;
 
-- Grant necessary permissions
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};

Logical Replication Configuration Requirements

Incremental sync requires proper PostgreSQL logical replication configuration:

Required Settings:

  • wal_level = logical

Verification Query:

-- Check WAL level
SHOW wal_level;
 
-- Check replication slots
SELECT * FROM pg_replication_slots;

Oracle Permissions

For Oracle data sources, grant the following permissions to ensure DataSync can properly access and sync data:

-- Basic query permissions
GRANT SELECT ANY TABLE TO ${user};
GRANT SELECT ANY DICTIONARY TO ${user};
 
-- Additional permissions for incremental sync
GRANT EXECUTE ON DBMS_LOGMNR TO ${user};
GRANT SELECT ON V_$LOGMNR_CONTENTS TO ${user};
GRANT SELECT ON V_$ARCHIVED_LOG TO ${user};
GRANT SELECT ON V_$LOG TO ${user};

Kafka Permissions

For Kafka data sources, ensure DataSync has the following permissions for data import or export:

Required Access:

  • Read permissions on specified topics (input source)
  • Write permissions on specified topics (output source)
  • Access permissions to Kafka cluster metadata

Configuration Example:

# Kafka connection configuration
bootstrap.servers=kafka-cluster:9092
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="${username}" password="${password}";

MongoDB Permissions

For MongoDB data sources, grant the following permissions to ensure DataSync can properly access and sync data:

// Full sync permissions
{
  role: "read",
  db: "${database}",
  privileges: [],
  roles: []
}
 
// Incremental sync permissions
{
  role: "read",
  db: "${database}",
  privileges: [],
  roles: []
}
// Plus oplog read permissions

Configuration Guide

Configuring DataSync to connect to data sources typically involves the following steps:

Obtaining Access Credentials

Prepare the target data source's address, port, username, password, database name (or Topic name, Project name, etc.), and necessary security credentials (such as Access Key/Secret Key, SSL certificates, etc.).

Security Recommendations:

  • Use dedicated sync accounts, avoid using administrator accounts
  • Regularly rotate access credentials
  • Enable SSL/TLS encrypted transmission

Choosing Connection Method

Direct Connection

Using direct network connection requires ensuring the firewall has added DataSync service IP addresses to the whitelist in advance.

Configuration Points:

  • Ensure network reachability
  • Configure appropriate security group rules
  • Test network connectivity
# Test network connectivity
telnet <data_source_host> <port>
 
# Test database connectivity
psql -h <host> -p <port> -U <username> -d <database>
mysql -h <host> -P <port> -u <username> -p <database>

Tunnel Connection

Through pre-established PrivateLink connections for more secure data transmission.

Configuration Points:

  • Configure VPC endpoint
  • Set up private network connection
  • Verify connection security

Connection Testing

After saving configuration, first perform connection testing to ensure DataSync can properly access the data source.

Testing Steps:

  1. Click "Test Connection" button
  2. Wait for test results
  3. Adjust configuration based on test results
  4. Retest until connection succeeds

Common Connection Issues:

-- Network connectivity test
SELECT 1;
 
-- Permission verification
SHOW GRANTS FOR CURRENT_USER;
 
-- Database accessibility test
SELECT current_database(), current_user;

Best Practices

Permission Configuration

  1. Principle of Least Privilege: Grant only the minimum permissions required for sync
  2. Dedicated Accounts: Create dedicated accounts for data synchronization
  3. Permission Auditing: Regularly review account permission configurations
-- Example: Create dedicated sync user for MySQL
CREATE USER 'datasync_user'@'%' IDENTIFIED BY 'strong_password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'datasync_user'@'%';
FLUSH PRIVILEGES;
 
-- Verify permissions
SHOW GRANTS FOR 'datasync_user'@'%';

Security Configuration

  1. Network Isolation: Use private network connections to avoid public network transmission
  2. Encrypted Transmission: Enable SSL/TLS to encrypt data transmission
  3. Access Control: Configure IP whitelists to restrict access
# Example: Secure connection configuration
datasync_config:
  security:
    ssl_enabled: true
    ssl_cert_path: "/path/to/cert.pem"
    ssl_key_path: "/path/to/key.pem"
    whitelist_ips:
      - "10.0.0.0/8"
      - "192.168.0.0/16"

Performance Optimization

  1. Connection Pooling: Properly configure connection pool size
  2. Batch Processing: Enable batch data processing to improve efficiency
  3. Concurrency Control: Adjust concurrency based on source system performance
{
  "connection_pool": {
    "max_connections": 10,
    "min_connections": 2,
    "connection_timeout": 30
  },
  "batch_processing": {
    "batch_size": 1000,
    "max_batch_interval": 5000
  },
  "concurrency": {
    "max_workers": 4,
    "queue_size": 1000
  }
}

Monitoring and Alerting

  1. Connection Status: Monitor data source connection status
  2. Performance Metrics: Monitor sync performance indicators
  3. Exception Alerting: Set up connection exception alerts
-- Monitor sync job status
SELECT 
    job_id,
    source_type,
    connection_status,
    last_sync_time,
    records_processed,
    error_count
FROM datasync_job_status 
WHERE source_type = 'mysql'
ORDER BY last_sync_time DESC;

Monitoring Dashboard Example:

  • Connection health status
  • Sync throughput (records/second)
  • Error rates and types
  • Resource utilization metrics

Through proper selection and configuration of data sources, you can ensure the stability, security, and efficiency of data synchronization, providing reliable data support for business operations.

Advanced Configuration Examples

High Availability Setup

# Multi-source failover configuration
datasync:
  primary_source:
    type: "mysql"
    host: "primary-db.example.com"
    port: 3306
  failover_sources:
    - type: "mysql"
      host: "secondary-db.example.com" 
      port: 3306
    - type: "mysql"
      host: "tertiary-db.example.com"
      port: 3306
  failover_strategy: "automatic"
  health_check_interval: 30

Multi-Region Data Sources

{
  "regions": {
    "us-east-1": {
      "mysql_primary": "mysql-us-east.example.com:3306",
      "kafka_cluster": "kafka-us-east.example.com:9092"
    },
    "eu-west-1": {
      "mysql_primary": "mysql-eu-west.example.com:3306", 
      "kafka_cluster": "kafka-eu-west.example.com:9092"
    }
  },
  "cross_region_sync": true,
  "compression_enabled": true
}

This comprehensive approach to data source selection and configuration ensures optimal performance and reliability for your Tacnode DataSync operations.