Metrics

Nodegroup Metrics

Displays various metrics for a specific Nodegroup over a designated period, including:

  • Target and Current Size: Shows the Nodegroup's target and current specifications. Current specifications indicate the number of units in normal service. When adding a new unit or increasing capacity, the condition of Target Size > Current Size might temporarily occur, which will later adjust to Target Size = Current Size. If Target Size != Current Size persists, it could suggest that the system is in an abnormal state, prompting you to contact the administrator.
  • Utilization: Reflects the utilization rate of Nodegroup resources, factoring in CPU and memory consumption. If this rate consistently exceeds 80%, capacity expansion may be necessary.
  • QPS: Indicates the number of SQL statements processed by Nodegroup each second, encompassing Select, Update, Insert, Delete, and Copy types.
  • SQL Latency: P99 represents the execution time of 99% of SQL statements on Nodegroup; P90 reflects the execution time of 90%. If the P99 or P90 latency remains abnormal for an extended period (such as several minutes), an analysis in conjunction with business and system conditions is required.
  • SQL Type Delays: P99 and P90 delays for various SQL types (select, insert, update, delete, copy) are recorded separately. If these indicators show long-term abnormalities (for example, several minutes), they should be analyzed alongside business and system conditions.
  • Network Throughput: Shows the Nodegroup's network throughput, including the total bytes received and sent.
  • Connections: Displays the total number of SQL connections on Nodegroup, categorizing them into active and idle connections.
  • Failed Query Count: Represents the number of SQL statements that failed to execute per second in Nodegroup. A sudden increase in this value necessitates an analysis of business and system conditions.
  • Affected Rows: Indicates the number of rows impacted by Nodegroup insert (INSERT), update (UPDATE), or delete (DELETE) operations per second. If any exceptions occur or the results diverge from expectations, an analysis in conjunction with business and system conditions is advised.

Database Metrics

Shows the storage sizes for the Nodegroup and Database dimensions:

  • Storage size for each Nodegroup: Displays the total storage size for all databases within each Nodegroup.
  • Storage size for individual databases: Displays the storage size for each separate database.

The database storage size encompasses the total storage space used by all data, including table indexes and transaction logs. Data ingestion, modification, indexing, transaction processing, schema changes, replication, and snapshots can all influence the overall storage size.

Monitoring Metrics

Metric KeyMetric NameTypeSample ValueLabelDescription
Nodegroup Computation Metrics
nodegroup_expect_unitsExpected Unitsgauge5
  • _cloudProvider
  • _region
  • _datacloudId
  • _id
  • _name

Target/Current Units: Displays the target and current unit count of Nodegroup. The current unit count reflects the number of units in normal service. During cluster creation or scaling, target units > current units may occur temporarily before equilibrium. If target units != current units persists, it may indicate an abnormal state—contact support if this occurs.

nodegroup_running_unitsRunning Unitsgauge5
nodegroup_resource_percent_normalizedUtilizationgauge0.8

Resource Utilization: Indicates overall resource utilization of Nodegroup, incorporating both CPU and memory usage. If persistently above 80%, consider scaling clusters.

nodegroup_select_qpsSelect QPSgauge1000

QPS: Number of SQL statements handled per second by Nodegroup, including Select, Update, Insert, Delete, and Copy queries.

nodegroup_update_qpsUpdate QPSgauge1000
nodegroup_insert_qpsInsert QPSgauge1000
nodegroup_delete_qpsDelete QPSgauge1000
nodegroup_copy_qpsCopy QPSgauge1
nodegroup_failure_qpsFailed Query QPSgauge1

Failed Queries: Number of failed SQL statements executed per second by Nodegroup. Investigate if this value surges in conjunction with business/system status.

nodegroup_insert_affected_rowsRows Affected by Insertgauge10000

Rows Affected: Shows the number of rows impacted per second by INSERT, UPDATE, or DELETE operations executed by Nodegroup. If anomalies or unexpected results occur, further analysis is required in conjunction with application and system status.

nodegroup_update_affected_rowsRows Affected by Updategauge10000
nodegroup_delete_affected_rowsRows Affected by Deletegauge10000
nodegroup_copy_affected_rowsRows Affected by Copygauge10000
nodegroup_sql_select_p90_latencySelect Latency (P90)gauge38818282.52 (ns)

SQL Latency by Type: Collects P99 and P90 latency metrics for each type of SQL operation (SELECT, INSERT, UPDATE, DELETE, COPY) in Nodegroup. Extended abnormal values (lasting several minutes or more) should be troubleshot relative to business processes and system conditions.

nodegroup_sql_select_p99_latencySelect Latency (P99)gauge38818282.52 (ns)
nodegroup_sql_insert_p90_latencyInsert Latency (P90)gauge38818282.52 (ns)
nodegroup_sql_insert_p99_latencyInsert Latency (P99)gauge38818282.52 (ns)
nodegroup_sql_update_p90_latencyUpdate Latency (P90)gauge38818282.52 (ns)
nodegroup_sql_update_p99_latencyUpdate Latency (P99)gauge38818282.52 (ns)
nodegroup_sql_delete_p90_latencyDelete Latency (P90)gauge38818282.52 (ns)
nodegroup_sql_delete_p99_latencyDelete Latency (P99)gauge38818282.52 (ns)
nodegroup_sql_copy_p90_latencyCopy Latency (P90)gauge38818282.52 (ns)
nodegroup_sql_copy_p99_latencyCopy Latency (P99)gauge38818282.52 (ns)
nodegroup_sql_service_p90_latencySQL Latency (P90)gauge38818282.52 (ns)

SQL Latency:
P99: 99th percentile of query execution duration measured in Nodegroup.
P90: 90th percentile of query execution duration measured in Nodegroup.
If P99 or P90 latency metrics remain abnormal for several minutes, business processes and system status must be referenced in diagnosis.

nodegroup_sql_service_p99_latencySQL Latency (P99)gauge38,818,282.52 (ns)
nodegroup_network_receive_bytesNetwork Throughput (Receive)gauge92,468.533333 (bytes)

Network Throughput: Displays Nodegroup network throughput, including bytes received and sent.

nodegroup_network_send_bytesNetwork Throughput (Send)gauge92,468.533333 (bytes)
nodegroup_active_sql_connectionsActive SQL Connectionsgauge100

Connections: Shows SQL connections on Nodegroup, including active and idle connections.

nodegroup_idle_sql_connectionsIdle SQL Connectionsgauge20
Database Storage Metrics
nodegroup_size_bytesDatabase Storage Sizegauge1,073,741,824 (1TB)
  • _cloudProvider
  • _region
  • _datacloudId
  • _handle

Per-database storage size: Displays the storage size for each database.

Includes all underlying physical storage used: table data, indexes, and WAL. Storage size is affected by insertions, updates, index rebuilds, transactions, schema changes, replication, and snapshots.

Backup Metrics
backup_size_bytesBackup Storage Sizegauge1,073,741,824 (1TB)
  • _cloudProvider
  • _region
  • _datacloudId
  • _handle
Storage size per backup
Data Sync Metrics
datasync_source_idle_timegaugetodo
  • _cloudProvider
  • _region
  • _datacloudId
  • _jobId
  • _jobName

Source idle time (seconds): Current system time - last record event time. Increases when there is no incoming data.

datasync_emit_event_timegaugetodo

Delay for the most recently received data (seconds): Last system receipt timestamp - last event's business time. Will not increment when there's no data at the source.

datasync_source_heartbeat_timegaugetodo

Source heartbeat time (seconds): Metric generation time - most recent attempt to read source. Growth indicates downstream backpressure.

datasync_rpsgaugetodoRecords per second
datasync_bpsgaugetodoBytes per second

On this page