Capacity Planning

Effective capacity planning in Tacnode involves understanding both storage and computing resource requirements for your specific use cases.

Storage Resources Planning

Data Compression Benefits

Tacnode uses LSM (Log Structured Merge Tree) architecture that provides:

  • Efficient Storage: Data typically compresses to 1/3 of original CSV size
  • Optimized Writes: Random writes converted to sequential writes for better performance
  • Adaptive Compression: Dictionary encoding provides additional compression for low-cardinality columns

Estimating Storage Requirements

Historical Data

  • Start with your current data size in CSV format
  • Apply compression factor: Original Size × 0.33 = Estimated Tacnode Size
  • Columnar Benefits: Low-cardinality columns achieve even better compression rates

Incremental Data Growth Calculate daily storage growth using:

Daily Growth = Daily Rows × Bytes per Row × Compression Factor (0.33)

Example Calculation:

  • 10M daily rows × 100 bytes/row × 0.33 = ~330MB daily growth

Storage Expansion Considerations

Compaction Process

  • LSM architecture performs regular data compaction
  • Temporary Expansion: Data may temporarily grow during compaction
  • Final Size: Stabilizes after compaction completes
  • Plan for 1.5-2x temporary storage during compaction periods

Computing Resources Planning

Computing capacity in Tacnode is measured in Units, which can be dynamically scaled based on workload requirements.

Sizing Guidelines

Transactional Workloads (High Write Activity)

  • Ratio: 1 Unit per ~500GB of compressed data
  • Best for: Real-time applications with frequent updates
  • Characteristics: Consistent read/write operations, low latency requirements

Analytical Workloads (Read-Heavy, Cold Data)

  • Ratio: 1 Unit per 1TB-2TB of compressed data
  • Maximum: Do not exceed 2TB per Unit for optimal performance
  • Best for: Batch processing, historical analysis, infrequent access patterns

Performance Considerations

Data Access Patterns

  • Hot Data: Recent data (daily/weekly) requires more computing power
  • Warm Data: Monthly data can use moderate Unit allocation
  • Cold Data: Historical data can operate with fewer Units per TB

Query Complexity

  • Complex aggregations and joins require additional computing capacity
  • Simple queries and scans can operate efficiently with baseline Units

Real-World Examples

Example 1: E-commerce Supply Chain

Scenario: Real-time inventory and order processing

  • Write QPS: 5K average, 28K peak
  • Data Volume: 1.5TB total
  • Daily Growth: 43GB raw → 14.3GB compressed
  • Recommended: 4 Units (transactional workload pattern)

Example 2: Industrial IoT Analytics

Scenario: Sensor data with scheduled batch processing

  • Data Volume: 20TB existing, 300GB daily raw → 100GB daily compressed
  • Access Pattern: Mostly cold data, periodic analysis
  • Recommended: 16 Units (analytical workload pattern, cold data optimization)

Optimization Best Practices

Start Conservative

  • Begin with lower Unit allocation
  • Monitor performance metrics
  • Scale up based on actual usage patterns

Monitor Key Metrics

  • Query response times
  • CPU and memory utilization
  • Storage I/O patterns
  • Concurrent query capacity

Cost Optimization

  • Use pause/resume functionality for non-production environments
  • Right-size Units based on actual workload patterns
  • Consider time-based scaling for predictable workloads