Data Sync Source

Selecting the appropriate data source is fundamental to successful data synchronization. The right choice ensures data integrity, optimizes sync performance, reduces integration complexity, and maintains high data quality standards.

Tacnode DataSync supports a comprehensive range of data sources, covering mainstream relational databases, big data messaging queues, NoSQL databases, and specialized cloud services.

Importance of Data Source Selection

Choosing the right data source enables:

Data Integrity Assurance: Select sources that support required data formats and protocols
Performance Optimization: Choose appropriate sync strategies based on source characteristics
Reduced Integration Complexity: Use supported data sources to minimize development overhead
Quality Guarantee: Select stable and reliable sources to ensure sync quality

Supported Data Sources

Input Data Sources

Tacnode DataSync can acquire and transmit data from the following sources:

Data Source	Sync Capabilities	Primary Use Cases
Tacnode	Efficient full and incremental data sync	Tacnode instance-to-instance migration
PostgreSQL	Efficient full and incremental data sync	PostgreSQL database synchronization
MySQL	Efficient full and incremental data sync	MySQL database synchronization
Oracle	Efficient full and incremental data sync	Oracle database synchronization
Kafka	Real-time event sync, supports KVS JSON, DOUBLE SERIALIZED KVS JSON, CANAL JSON, CANAL PROTOBUF formats	Real-time data stream processing
MongoDB	Efficient full and incremental data sync	MongoDB database synchronization
Alibaba Cloud ADB (AnalyticDB)	Efficient full sync	Alibaba Cloud analytical database
Alibaba Cloud Data Hub	Real-time event sync	Alibaba Cloud real-time data processing
Alibaba Cloud SLS (Simple Log Service)	Real-time event sync	Alibaba Cloud log service
Other Relational Databases	Support for protocol-compatible databases	Protocol-compatible database systems

Output Data Sources

Currently, Tacnode DataSync supports exporting data to:

Data Source	Export Capabilities	Configuration Requirements
Kafka	Export Tacnode database change events to Kafka topics, supports Maxwell and KVS formats	Kafka cluster connection info and topic configuration

Permission Configuration

Different data sources require specific permissions to ensure DataSync can properly access and synchronize data.

MySQL Permissions

For MySQL data sources, different sync types require different permissions:

Full Sync Permissions

GRANT SELECT, SHOW DATABASES ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;

Incremental Sync Permissions

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO '${user}' IDENTIFIED BY '${password}';
FLUSH PRIVILEGES;

Binlog Configuration Requirements

Incremental sync requires proper MySQL binlog configuration:

Required Settings:

server_id: Non-empty
log_bin: 1 (enabled)
binlog_format: ROW
binlog_row_image: FULL

Verification Query:

-- Check binlog configuration
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'server_id';

PostgreSQL Permissions

For PostgreSQL data sources, permission requirements vary by sync type:

Full Sync Permissions

-- Grant SELECT permissions on all tables in specified schemas
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};

Incremental Sync Permissions

-- Grant REPLICATION privilege to existing user
ALTER USER ${user} WITH REPLICATION;
 
-- Grant necessary permissions
GRANT SELECT ON ALL TABLES IN SCHEMA ${schema1}[, ${other_schema}] TO ${user};
GRANT USAGE ON SCHEMA ${schema1}[, ${other_schema}] TO ${user};

Logical Replication Configuration Requirements

Incremental sync requires proper PostgreSQL logical replication configuration:

Required Settings:

wal_level = logical

Verification Query:

-- Check WAL level
SHOW wal_level;
 
-- Check replication slots
SELECT * FROM pg_replication_slots;

Oracle Permissions

For Oracle data sources, grant the following permissions to ensure DataSync can properly access and sync data:

-- Basic query permissions
GRANT SELECT ANY TABLE TO ${user};
GRANT SELECT ANY DICTIONARY TO ${user};
 
-- Additional permissions for incremental sync
GRANT EXECUTE ON DBMS_LOGMNR TO ${user};
GRANT SELECT ON V_$LOGMNR_CONTENTS TO ${user};
GRANT SELECT ON V_$ARCHIVED_LOG TO ${user};
GRANT SELECT ON V_$LOG TO ${user};

Kafka Permissions

For Kafka data sources, ensure DataSync has the following permissions for data import or export:

Required Access:

Read permissions on specified topics (input source)
Write permissions on specified topics (output source)
Access permissions to Kafka cluster metadata

Configuration Example:

# Kafka connection configuration
bootstrap.servers=kafka-cluster:9092
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="${username}" password="${password}";

MongoDB Permissions

For MongoDB data sources, grant the following permissions to ensure DataSync can properly access and sync data:

// Full sync permissions
{
  role: "read",
  db: "${database}",
  privileges: [],
  roles: []
}
 
// Incremental sync permissions
{
  role: "read",
  db: "${database}",
  privileges: [],
  roles: []
}
// Plus oplog read permissions

Configuration Guide

Configuring DataSync to connect to data sources typically involves the following steps:

Obtaining Access Credentials

Prepare the target data source's address, port, username, password, database name (or Topic name, Project name, etc.), and necessary security credentials (such as Access Key/Secret Key, SSL certificates, etc.).

Security Recommendations:

Use dedicated sync accounts, avoid using administrator accounts
Regularly rotate access credentials
Enable SSL/TLS encrypted transmission

Choosing Connection Method

Direct Connection

Using direct network connection requires ensuring the firewall has added DataSync service IP addresses to the whitelist in advance.

Configuration Points:

Ensure network reachability
Configure appropriate security group rules
Test network connectivity

# Test network connectivity
telnet <data_source_host> <port>
 
# Test database connectivity
psql -h <host> -p <port> -U <username> -d <database>
mysql -h <host> -P <port> -u <username> -p <database>

Tunnel Connection

Through pre-established PrivateLink connections for more secure data transmission.

Configuration Points:

Configure VPC endpoint
Set up private network connection
Verify connection security

Connection Testing

After saving configuration, first perform connection testing to ensure DataSync can properly access the data source.

Testing Steps:

Click "Test Connection" button
Wait for test results
Adjust configuration based on test results
Retest until connection succeeds

Common Connection Issues:

-- Network connectivity test
SELECT 1;
 
-- Permission verification
SHOW GRANTS FOR CURRENT_USER;
 
-- Database accessibility test
SELECT current_database(), current_user;

Best Practices

Permission Configuration

Principle of Least Privilege: Grant only the minimum permissions required for sync
Dedicated Accounts: Create dedicated accounts for data synchronization
Permission Auditing: Regularly review account permission configurations

-- Example: Create dedicated sync user for MySQL
CREATE USER 'datasync_user'@'%' IDENTIFIED BY 'strong_password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'datasync_user'@'%';
FLUSH PRIVILEGES;
 
-- Verify permissions
SHOW GRANTS FOR 'datasync_user'@'%';

Security Configuration

Network Isolation: Use private network connections to avoid public network transmission
Encrypted Transmission: Enable SSL/TLS to encrypt data transmission
Access Control: Configure IP whitelists to restrict access

# Example: Secure connection configuration
datasync_config:
  security:
    ssl_enabled: true
    ssl_cert_path: "/path/to/cert.pem"
    ssl_key_path: "/path/to/key.pem"
    whitelist_ips:
      - "10.0.0.0/8"
      - "192.168.0.0/16"

Performance Optimization

Connection Pooling: Properly configure connection pool size
Batch Processing: Enable batch data processing to improve efficiency
Concurrency Control: Adjust concurrency based on source system performance

{
  "connection_pool": {
    "max_connections": 10,
    "min_connections": 2,
    "connection_timeout": 30
  },
  "batch_processing": {
    "batch_size": 1000,
    "max_batch_interval": 5000
  },
  "concurrency": {
    "max_workers": 4,
    "queue_size": 1000
  }
}

Monitoring and Alerting

Connection Status: Monitor data source connection status
Performance Metrics: Monitor sync performance indicators
Exception Alerting: Set up connection exception alerts

-- Monitor sync job status
SELECT 
    job_id,
    source_type,
    connection_status,
    last_sync_time,
    records_processed,
    error_count
FROM datasync_job_status 
WHERE source_type = 'mysql'
ORDER BY last_sync_time DESC;

Monitoring Dashboard Example:

Connection health status
Sync throughput (records/second)
Error rates and types
Resource utilization metrics

Through proper selection and configuration of data sources, you can ensure the stability, security, and efficiency of data synchronization, providing reliable data support for business operations.

Advanced Configuration Examples

High Availability Setup

# Multi-source failover configuration
datasync:
  primary_source:
    type: "mysql"
    host: "primary-db.example.com"
    port: 3306
  failover_sources:
    - type: "mysql"
      host: "secondary-db.example.com" 
      port: 3306
    - type: "mysql"
      host: "tertiary-db.example.com"
      port: 3306
  failover_strategy: "automatic"
  health_check_interval: 30

Multi-Region Data Sources

{
  "regions": {
    "us-east-1": {
      "mysql_primary": "mysql-us-east.example.com:3306",
      "kafka_cluster": "kafka-us-east.example.com:9092"
    },
    "eu-west-1": {
      "mysql_primary": "mysql-eu-west.example.com:3306", 
      "kafka_cluster": "kafka-eu-west.example.com:9092"
    }
  },
  "cross_region_sync": true,
  "compression_enabled": true
}

This comprehensive approach to data source selection and configuration ensures optimal performance and reliability for your Tacnode DataSync operations.

Data Sync Source

On this page