Metrics Monitored for Aviatrix Resources
The system metrics and network metrics that you can access in CoPilot are captured by the Aviatrix Controller. The Controller pulls the data from virtual machines (instances/hosts) that Aviatrix Gateways run on and feeds that data to CoPilot.
Some metrics can be used for triggering actions such as alerting and Gateway scaling. You can also use metrics to monitor the performance of Gateway hosts. You can monitor performance in CoPilot from the Monitor > Performance page.
In addition, with the Aviatrix Network Insights API add-on license, you can use APIs to analyze the performance and health of your Aviatrix-managed resources in external monitoring systems. See Monitoring with Network Insights API for information about using Network Insights Metric and Status APIs.
About Metrics that are Monitored for Aviatrix Resources
Aviatrix Controller captures system metric and network metric information about the virtual machines (instances/hosts) that Aviatrix Gateways run on.
Health-type metric information is also captured for Controller and CoPilot virtual machines. See Global Control Plane Health Alert.
Metrics that are monitored by Aviatrix Controller and Aviatrix CoPilot include the following:
On the CoPilot Monitor > Performance page, you can select metrics to monitor performance on your resource VMs.
On the CoPilot Monitor > Notifications > Alerts Configuration page, you can configure how to use the pre-existing set of metrics to send notifications about events that occur in your network, such as performance bottlenecks or other problems.
To better understand how notifications and alerts work and how to configure them in CoPilot, see documentation:monitoring-troubleshooting:user-alerts-network.adoc.
For more information about integrating Aviatrix metric and status APIs with external monitoring tools, see Monitoring with Network Insights API.
Health Metrics for Triggering Notifications or Other Actions
The following health metrics are available in CoPilot. They are listed in alphabetical order, by the name used in the CoPilot UI.
Name (Health Metric) | Description | Internal Metric Name | Accessible by API |
---|---|---|---|
BGP Peering Status |
Any BGP peering status change triggers an alert. |
BGPpeeringStatus |
|
Connection Status |
Any connection status change on the specified gateways/connections triggers an alert. |
ConnectionStatus |
|
Gateway Status |
Any gateway status change triggers an alert. |
GatewayStatus |
|
Underlay Connection Status |
Monitors the syslog from any connection that includes the host as the source or destination. When syslog data indicates a potential problem from each direction of the connection between that host and another host within 30 seconds of the other, the alert is triggered. On the same connection, if the syslog data later indicates the problem is resolved from either direction, the alert is automatically resolved. |
UnderlayConnectionStatus |
System Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following system metrics. Aviatrix gateways report live Linux system statistics (such as memory, CPU, I/O, processes, and swap) for the instances/virtual machines on which they run. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.
Name (System Metric) | Description | Internal Metric Name | Accessible by API |
---|---|---|---|
CPU Idle (%) |
Of the total CPU time, the percentage of time the CPU(s) spent idle and waiting for tasks from the kernel. |
cpu_idle |
|
CPU Kernel Space (%) |
Of the total kernel space memory on the host (VM/instance), the percentage of time spent running kernel code. |
cpu_ks |
|
CPU Steal (%) |
Of the average CPU wait time on the host (VM/instance), the percentage of time a virtual CPU waits for a real CPU while the hypervisor services another virtual processor. |
cpu_steal |
|
CPU Used (%) |
The percentage of CPU used. |
cpu_used_per |
|
CPU User Space (%) |
Of the total CPU time, the percentage of time spent running non-kernel code. |
cpu_us |
|
CPU Wait (%) |
Of the total CPU time, the percentage of time spent waiting for IO. |
cpu_wait |
|
Disk Free |
The storage space on the disk (volume) that is free/unused. |
hdisk_free |
|
Disk Free (%) |
Of the total storage space on the disk (volume), the percentage of storage space that is free/unused. |
hdisk_free_per |
|
Disk Total |
The total storage space on the disk (volume). |
hdisk_tot |
|
IO Blocks In |
The number of blocks received per second from a block device. |
io_blk_in |
|
IO Blocks Out |
The number of blocks sent per second to a block device. |
io_blk_out |
|
Memory Available |
The available amount of memory that can be allocated to new or existing processes. |
memory_available |
|
Memory Available (%) |
Of the total memory, the percentage of the available memory that can be allocated to new or existing processes. |
memory_available_per |
|
Memory Buffer |
The amount of memory used as buffers. |
memory_buf |
|
Memory Cache |
The amount of memory used as cache. |
memory_cached |
|
Memory Swapped |
If swapped is enabled, the amount of virtual memory used. |
memory_swpd |
|
Memory Total |
The total memory. |
memory_tot |
|
Memory Used |
The amount of memory used. |
memory_used |
|
Memory Used (%) |
Of the total memory, the percentage of memory used. |
memory_used_per |
|
Processes Uninterruptible Sleep |
The number of processes blocked waiting for I/O to complete. |
nproc_non_int_sleep |
|
Processes Waiting To Be Run |
The number of processes that are running or waiting for run time. |
nproc_running |
|
Swaps From Disk |
Memory that is swapped in every second from disk in kilobytes. |
swap_from_disk |
|
Swaps To Disk |
Memory that is swapped out every second to disk in kilobytes. |
swap_to_disk |
|
System Context Switches |
The number of context switches per second. |
system_cs |
|
System Interrupts |
The number of interrupts per second, including the clock. |
system_int |
Network Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following network metrics. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.
Name (Network Metric) | Description | Internal Metric Name | Accessible by API | ||
---|---|---|---|---|---|
Bandwidth Egress Limit Exceeded |
Bandwidth Egress Limit Exceeded |
bandwidth_egress_limit_exceeded |
|||
Bandwidth Egress Limit Exceeded (%) |
Bandwidth Egress Limit Exceeded (%) |
per_bandwidth_egress_limit |
|||
Bandwidth Egress Limit Exceeded Rate |
The number of tx packets dropped because the bandwidth allowance limit was exceeded. This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS. |
rate_bandwidth_egress_limit_exceeded |
|||
Bandwidth Ingress Limit Exceeded |
Bandwidth Ingress Limit Exceeded |
bandwidth_ingress_limit_exceeded |
|||
Bandwidth Ingress Limit Exceeded (%) |
The percentage of dropped rx packets due to exceeding the bandwidth allowance limit. This metric is specific to the ENA driver on AWS. |
per_bandwidth_ingress_limit_exceeded |
|||
Bandwidth Ingress Limit Exceeded Rate |
(AWS Only) Bandwidth Ingress Limit Exceeded Rate — The number of rx packets dropped because the bandwidth allowance limit was exceeded. This metric is supplied by the ENA driver only on AWS. |
rate_bandwidth_ingress_limit_exceeded |
|||
Collisions during Transmission |
The count of collisions during packet transmission. |
tx_colls |
|||
Collisions Rate during Transmission |
The number of collisions per second during packet transmission. |
rate_tx_colls |
|||
Compressed Packets Received |
The count of compressed packets received. |
rx_compressed |
|||
Compressed Packets Received Rate |
The number of compressed packets received per second. |
rate_rx_compressed |
|||
Compressed Packets Transmitted |
The count of correctly received compressed packets. |
tx_compressed |
|||
Compressed Packets Transmitted Rate |
The number of correctly received compressed packets per second. |
rate_tx_compressed |
|||
Conntrack Allowance Available |
(AWS Only) Reports the number of available tracked connections that can be established before an instance’s Connections Tracked allowance is exceeded. This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS. |
conntrack_allowance_available |
|||
Conntrack Limit Exceeded |
Conntrack Limit Exceeded |
conntrack_limit_exceeded |
|||
Conntrack Limit Exceeded (%) |
Conntrack Limit Exceeded (%) |
per_conntrack_limit_exceeded |
|||
Conntrack Limit Exceeded Rate |
Conntrack limit exceeded rate. |
rate_conntrack_limit_exceeded |
|||
Conntrack Usage Rate |
(AWS Only) The rate at which conntrack capacity is being used up in connections per second. The Conntrack Usage Rate metric is only available in AWS where the Conntrack Allowance Available (conntrack_allowance_available) metric is present. |
conntrack_usage_rate |
|||
Drop Rate during Transmission |
The number of packets being dropped per second while sending. |
rate_tx_drop |
|||
Drop Rate while Receiving |
The number of packets being dropped per second while receiving. |
rate_rx_drop |
|||
Errored Packets Received |
The count of packets received that is flagged by the kernel as errored. |
rx_errs |
|||
Errored Packets Received Rate |
The number of packets received per second that is flagged by the kernel as errored. |
rate_rx_errs |
|||
Errored Packets Transmitted |
The total number of transmit problems. |
tx_errs |
|||
Errored Packets Transmitted Rate |
The total number of transmit problems per second. |
rate_tx_errs |
|||
Interface Drops during Transmission (%) |
Interface Drops during Transmission (%) |
per_tx_drop |
|||
Interface Drops while Receiving (%) |
Interface Drops while Receiving (%) |
per_rx_drop |
|||
Interface Errors during Transmission (%) |
Interface Errors during Transmission (%) |
per_tx_errs |
|||
Interface Errors while Receiving (%) |
Interface Errors while Receiving (%) |
per_rx_errs |
|||
Limit Exceeded Rate (PPS) - AWS Only |
The number of packets that exceed the maximum for the instance type that are processed (bidirectionally) by the Aviatrix gateway per second. |
rate_pps_limit_exceeded |
|||
Linklocal Limit Exceeded |
Linklocal Limit Exceeded |
linklocal_limit_exceeded |
|||
Linklocal Limit Exceeded (%) |
Linklocal Limit Exceeded (%) |
per_linklocal_limit_exceeded |
|||
Linklocal Limit Exceeded Rate |
Linklocal Limit Exceeded Rate |
rate_linklocal_limit_exceeded |
|||
Multicast Packets Received |
Multicast Packets Received |
rx_multicast |
|||
Multicast Packets Received Rate |
The number of multicast packets per second. |
rate_rx_multicast |
|||
PPS Limit Exceeded |
The count of bidirectional packets that exceed the maximum for the instance type and are handled by the Aviatrix gateway. |
pps_limit_exceeded |
|||
PPS Limit Exceeded Drop (%) |
PPS Limit Exceeded Drop (%) |
per_pps_limit_exceeded |
|||
Packet Drop (%) |
Packet Drop (%) |
per_pkt_drop |
|||
Packet Drop Rate |
The rate at which packets are dropped per second. |
rate_pkt_drop |
|||
Packet Failure (%) |
Packet Failure (%) |
per_pkt_fail |
|||
Packet Failure Rate |
Packet Failure Rate |
rate_pkt_fail |
|||
Packets Dropped during Transmission |
The count of packets that were dropped during transmission, often due to resource constraints. |
tx_drop |
|||
Packets Dropped while Receiving |
The count of received packets that were not processed, typically due to resource limitations or unsupported protocols. |
rx_drop |
|||
Peak Received Rate |
Peak Received Rate |
rate_peak_received |
|||
Peak Total Rate |
Peak Total Rate |
rate_peak_total |
|||
Peak Transmitted Rate |
Peak Transmitted Rate |
rate_peak_sent |
|||
Received Bytes |
Received Bytes |
rx_bytes |
|||
Received Frames Rate |
Received Frame Rate
|
rate_rx_frame |
|||
Received Packets |
Received Packets |
rx_packets |
|||
Received Rate |
Received Rate |
rate_received |
|||
Received Rate (PPS) |
Packets Received Rate — The total (received) transmission in packet level per second. |
pkt_rx_rate |
|||
Receiver FIFO Frames |
Receiver FIFO Frames |
rx_fifo |
|||
Receiver FIFO Frames Rate |
The number of overflow events per second when receiving packets. |
rate_rx_fifo |
|||
Received Frames |
Received Frames |
rx_frame |
|||
Total Attempted Rate |
Total Attempted Rate |
rate_pkt_attempted |
|||
Total Rate |
The total (bidirectional) rate of bits processed per second by the interface on the Aviatrix VM/instance. |
rate_total |
|||
Total Rate (in packets) |
The total (bidirectional) transmission in packet level per second. Instance size impacts how many packets per second the gateway can handle. |
pkt_rate_total |
|||
Transmission FIFO Frames Rate |
The number of frame transmission errors per second due to device FIFO underrun/underflow. |
rate_tx_fifo |
|||
Transmission FIFO Frames |
The number of frame transmission errors due to device FIFO underrun/underflow. |
tx_fifo |
|||
Transmitted Bytes |
Transmitted Bytes |
tx_bytes |
|||
Transmitted Carrier Frames |
Transmitted Carrier Frames |
tx_carrier |
|||
Transmitted Carrier Frames Rate |
Transmitted Carrier Frames Rate |
rate_tx_carrier |
|||
Transmitted Packets |
Transmitted Packets |
tx_packets |
|||
Transmitted Rate |
The rate of bits per second that has been transmitted by the interface on the Aviatrix gateway VM/instance. |
rate_sent |
|||
Transmitted Rate (PPS) |
Transmitted Rate (PPS) |
pkt_tx_rate |