Metrics
Nodegroup Metrics
Displays various metrics for a specific Nodegroup over a designated period, including:
- Target and Current Size: Shows the Nodegroup's target and current specifications. Current specifications indicate the number of units in normal service. When adding a new unit or increasing capacity, the condition of Target Size > Current Size might temporarily occur, which will later adjust to Target Size = Current Size. If Target Size != Current Size persists, it could suggest that the system is in an abnormal state, prompting you to contact the administrator.
- Utilization: Reflects the utilization rate of Nodegroup resources, factoring in CPU and memory consumption. If this rate consistently exceeds 80%, capacity expansion may be necessary.
- QPS: Indicates the number of SQL statements processed by Nodegroup each second, encompassing Select, Update, Insert, Delete, and Copy types.
- SQL Latency: P99 represents the execution time of 99% of SQL statements on Nodegroup; P90 reflects the execution time of 90%. If the P99 or P90 latency remains abnormal for an extended period (such as several minutes), an analysis in conjunction with business and system conditions is required.
- SQL Type Delays: P99 and P90 delays for various SQL types (select, insert, update, delete, copy) are recorded separately. If these indicators show long-term abnormalities (for example, several minutes), they should be analyzed alongside business and system conditions.
- Network Throughput: Shows the Nodegroup's network throughput, including the total bytes received and sent.
- Connections: Displays the total number of SQL connections on Nodegroup, categorizing them into active and idle connections.
- Failed Query Count: Represents the number of SQL statements that failed to execute per second in Nodegroup. A sudden increase in this value necessitates an analysis of business and system conditions.
- Affected Rows: Indicates the number of rows impacted by Nodegroup insert (INSERT), update (UPDATE), or delete (DELETE) operations per second. If any exceptions occur or the results diverge from expectations, an analysis in conjunction with business and system conditions is advised.
Database Metrics
Shows the storage sizes for the Nodegroup and Database dimensions:
- Storage size for each Nodegroup: Displays the total storage size for all databases within each Nodegroup.
- Storage size for individual databases: Displays the storage size for each separate database.
The database storage size encompasses the total storage space used by all data, including table indexes and transaction logs. Data ingestion, modification, indexing, transaction processing, schema changes, replication, and snapshots can all influence the overall storage size.
Monitoring Metrics
Metric Key | Metric Name | Type | Sample Value | Label | Description |
---|---|---|---|---|---|
Nodegroup Computation Metrics | |||||
nodegroup_expect_units | Expected Units | gauge | 5 |
| Target/Current Units: Displays the target and current unit count of Nodegroup. The current unit count reflects the number of units in normal service. During cluster creation or scaling, target units > current units may occur temporarily before equilibrium. If target units != current units persists, it may indicate an abnormal state—contact support if this occurs. |
nodegroup_running_units | Running Units | gauge | 5 | ||
nodegroup_resource_percent_normalized | Utilization | gauge | 0.8 | Resource Utilization: Indicates overall resource utilization of Nodegroup, incorporating both CPU and memory usage. If persistently above 80%, consider scaling clusters. | |
nodegroup_select_qps | Select QPS | gauge | 1000 | QPS: Number of SQL statements handled per second by Nodegroup, including Select, Update, Insert, Delete, and Copy queries. | |
nodegroup_update_qps | Update QPS | gauge | 1000 | ||
nodegroup_insert_qps | Insert QPS | gauge | 1000 | ||
nodegroup_delete_qps | Delete QPS | gauge | 1000 | ||
nodegroup_copy_qps | Copy QPS | gauge | 1 | ||
nodegroup_failure_qps | Failed Query QPS | gauge | 1 | Failed Queries: Number of failed SQL statements executed per second by Nodegroup. Investigate if this value surges in conjunction with business/system status. | |
nodegroup_insert_affected_rows | Rows Affected by Insert | gauge | 10000 | Rows Affected: Shows the number of rows impacted per second by INSERT, UPDATE, or DELETE operations executed by Nodegroup. If anomalies or unexpected results occur, further analysis is required in conjunction with application and system status. | |
nodegroup_update_affected_rows | Rows Affected by Update | gauge | 10000 | ||
nodegroup_delete_affected_rows | Rows Affected by Delete | gauge | 10000 | ||
nodegroup_copy_affected_rows | Rows Affected by Copy | gauge | 10000 | ||
nodegroup_sql_select_p90_latency | Select Latency (P90) | gauge | 38818282.52 (ns) | SQL Latency by Type: Collects P99 and P90 latency metrics for each type of SQL operation (SELECT, INSERT, UPDATE, DELETE, COPY) in Nodegroup. Extended abnormal values (lasting several minutes or more) should be troubleshot relative to business processes and system conditions. | |
nodegroup_sql_select_p99_latency | Select Latency (P99) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_insert_p90_latency | Insert Latency (P90) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_insert_p99_latency | Insert Latency (P99) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_update_p90_latency | Update Latency (P90) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_update_p99_latency | Update Latency (P99) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_delete_p90_latency | Delete Latency (P90) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_delete_p99_latency | Delete Latency (P99) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_copy_p90_latency | Copy Latency (P90) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_copy_p99_latency | Copy Latency (P99) | gauge | 38818282.52 (ns) | ||
nodegroup_sql_service_p90_latency | SQL Latency (P90) | gauge | 38818282.52 (ns) | SQL Latency: | |
nodegroup_sql_service_p99_latency | SQL Latency (P99) | gauge | 38,818,282.52 (ns) | ||
nodegroup_network_receive_bytes | Network Throughput (Receive) | gauge | 92,468.533333 (bytes) | Network Throughput: Displays Nodegroup network throughput, including bytes received and sent. | |
nodegroup_network_send_bytes | Network Throughput (Send) | gauge | 92,468.533333 (bytes) | ||
nodegroup_active_sql_connections | Active SQL Connections | gauge | 100 | Connections: Shows SQL connections on Nodegroup, including active and idle connections. | |
nodegroup_idle_sql_connections | Idle SQL Connections | gauge | 20 | ||
Database Storage Metrics | |||||
nodegroup_size_bytes | Database Storage Size | gauge | 1,073,741,824 (1TB) |
| Per-database storage size: Displays the storage size for each database. Includes all underlying physical storage used: table data, indexes, and WAL. Storage size is affected by insertions, updates, index rebuilds, transactions, schema changes, replication, and snapshots. |
Backup Metrics | |||||
backup_size_bytes | Backup Storage Size | gauge | 1,073,741,824 (1TB) |
| Storage size per backup |
Data Sync Metrics | |||||
datasync_source_idle_time | gauge | todo |
| Source idle time (seconds): Current system time - last record event time. Increases when there is no incoming data. | |
datasync_emit_event_time | gauge | todo | Delay for the most recently received data (seconds): Last system receipt timestamp - last event's business time. Will not increment when there's no data at the source. | ||
datasync_source_heartbeat_time | gauge | todo | Source heartbeat time (seconds): Metric generation time - most recent attempt to read source. Growth indicates downstream backpressure. | ||
datasync_rps | gauge | todo | Records per second | ||
datasync_bps | gauge | todo | Bytes per second |