Metrics Monitored for Aviatrix Resources
This section describes the system metrics and network metrics Aviatrix Controller captures on virtual machines (instances/hosts) that Aviatrix Gateways run on. Some of the metrics can be used for triggering actions such as alerting and Gateway scaling. You can monitor the performance of these metrics for Gateway hosts from the CoPilot > Monitor > Performance page.
About Metrics that are Monitored for Aviatrix Resources
Aviatrix Controller captures system metric and network metric information about the virtual machines (instances/hosts) that Aviatrix Gateways run on.
Health-type metric information is also captured for Controller and CoPilot virtual machines.
In the CoPilot Performance page, you can use the metrics to monitor if your resource VMs are operating with good performance.
In the CoPilot Notifications page, you can use the metrics to be notified to events that occur in your network such as performance bottlenecks or other problems.
You configure alerts and the channels to be notified using the CoPilot > Monitor > Notifications > Alerts Configuration page.
See Configuring Alerts.
When alert conditions are met for a metric, Copilot records the event in the CoPilot > Monitor > Notifications > Alerts page; this page is a tabular view of Aviatrix platform triggered alerts with search and filter capability.
How you set a condition threshold to trigger an alert will depend on different factors. For example, for system metrics, the instance size can influence the condition threshold that makes sense. For metrics associated with cloud provider-maintained infrastructure, the desired condition threshold may vary between cloud service providers. Work with your network operations team to determine the metric conditions that will trigger alerts in your environment.
Health Metrics for Triggering Notifications or Other Actions
Health metrics include:
Name (Health Metric) | Description |
---|---|
BGP Peering Status |
Any BGP peering status change triggers an alert. BGPpeeringStatus |
Connection Status |
Any connection status change on the specified gateways/connections triggers an alert. ConnectionStatus |
Gateway Status |
Any gateway status change triggers an alert. GatewayStatus |
Underlay Connection Status |
Monitors the syslog from any connection that includes the host as the source or destination. When syslog data indicates a potential problem from each direction of the connection between that host and another host within 30 seconds of the other, the alert is triggered. On the same connection, if the syslog data later indicates the problem is resolved from either direction, the alert is automatically resolved. UnderlayConnectionStatus |
System Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following system metrics. Aviatrix gateways report live Linux system statistics (such as memory, CPU, I/O, processes, and swap) for the instances/virtual machines on which they run.
Name (System Metric) | Description |
---|---|
CPU Idle (%) |
Of the total CPU time, the percentage of time the CPU(s) spent idle and waiting for tasks from the kernel. cpu_idle |
CPU Kernel Space (%) |
Of the total kernel space memory on the host (VM/instance), the percentage of time spent running kernel code. cpu_ks |
CPU Steal (%) |
Of the average CPU wait time on the host (VM/instance), the percentage of time a virtual CPU waits for a real CPU while the hypervisor services another virtual processor. cpu_steal |
CPU Used (%) |
The percentage of CPU used. cpu_used_per |
CPU User Space (%) |
Of the total CPU time, the percentage of time spent running non-kernel code. cpu_us |
CPU Wait (%) |
Of the total CPU time, the percentage of time spent waiting for IO. cpu_wait |
Disk Free |
The storage space on the disk (volume) that is free/unused. hdisk_free |
Disk Free (%) |
Of the total storage space on the disk (volume), the percentage of storage space that is free/unused. hdisk_free_per |
Disk Total |
The total storage space on the disk (volume). hdisk_tot |
IO Blocks In |
The number of blocks received per second from a block device. io_blk_in |
IO Blocks Out |
The number of blocks sent per second to a block device. io_blk_out |
Memory Available |
The available amount of memory that can be allocated to new or existing processes. memory_available |
Memory Available (%) |
Of the total memory, the percentage of the available memory that can be allocated to new or existing processes. memory_available_per |
Memory Buffer |
The amount of memory used as buffers. memory_buf |
Memory Cache |
The amount of memory used as cache. memory_cached |
Memory Swapped |
If swapped is enabled, the amount of virtual memory used. memory_swpd |
Memory Total |
The total memory. memory_tot |
Memory Used |
The amount of memory used. memory_used |
Memory Used (%) |
Of the total memory, the percentage of memory used. memory_used_per |
Processes Uninterruptible Sleep |
The number of processes blocked waiting for I/O to complete. nproc_non_int_sleep |
Processes Waiting To Be Run |
The number of processes that are running or waiting for run time. nproc_running |
Swaps From Disk |
Memory that is swapped in every second from disk in kilobytes. swap_from_disk |
Swaps To Disk |
Memory that is swapped out every second to disk in kilobytes. swap_to_disk |
System Context Switches |
The number of context switches per second. system_cs |
System Interrupts |
The number of interrupts per second, including the clock. system_int |
Network Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following network metrics.
Name (Network Metric) | Description |
---|---|
Bandwidth Egress Limit Exceeded |
Bandwidth Egress Limit Exceeded bandwidth_egress_limit_exceeded |
Bandwidth Egress Limit Exceeded (%) |
Bandwidth Egress Limit Exceeded (%) per_bandwidth_egress_limit |
Bandwidth Egress Limit Exceeded Rate |
The number of tx packets dropped because the bandwidth allowance limit was exceeded. This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS. rate_bandwidth_egress_limit_exceeded |
Bandwidth Ingress Limit Exceeded |
Bandwidth Ingress Limit Exceeded bandwidth_ingress_limit_exceeded |
Bandwidth Ingress Limit Exceeded (%) |
The percentage of dropped rx packets due to exceeding the bandwidth allowance limit. This metric is specific to the ENA driver on AWS. per_bandwidth_ingress_limit_exceeded |
Bandwidth Ingress Limit Exceeded Rate |
(AWS Only) Bandwidth Ingress Limit Exceeded Rate — The number of rx packets dropped because the bandwidth allowance limit was exceeded. This metric is supplied by the ENA driver only on AWS. rate_bandwidth_ingress_limit_exceeded |
Collisions during Transmission |
The count of collisions during packet transmission. tx_colls |
Collisions Rate during Transmission |
The number of collisions per second during packet transmission. rate_tx_colls |
Compressed Packets Received |
The count of compressed packets received. rx_compressed |
Compressed Packets Received Rate |
The number of compressed packets received per second. rate_rx_compressed |
Compressed Packets Transmitted |
The count of correctly received compressed packets. tx_compressed |
Compressed Packets Transmitted Rate |
The number of correctly received compressed packets per second. rate_tx_compressed |
Conntrack Allowance Available |
(AWS Only) Reports the number of available tracked connections that can be established before an instance’s Connections Tracked allowance is exceeded. This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS. conntrack_allowance_available |
Conntrack Limit Exceeded |
Conntrack Limit Exceeded conntrack_limit_exceeded |
Conntrack Limit Exceeded (%) |
Conntrack Limit Exceeded (%) per_conntrack_limit_exceeded |
Conntrack Limit Exceeded Rate |
Conntrack limit exceeded rate. rate_conntrack_limit_exceeded |
Conntrack Usage Rate |
(AWS Only) The rate at which conntrack capacity is being used up in connections per second. The Conntrack Usage Rate metric is only available in AWS where the Conntrack Allowance Available (conntrack_allowance_available) metric is present. conntrack_usage_rate |
Drop Rate during Transmission |
The number of packets being dropped per second while sending. rate_tx_drop |
Drop Rate while Receiving |
Drop Rate while Receiving rate_rx_drop |
Errored Packets Received |
The count of packets received that is flagged by the kernel as errored. rx_errs |
Errored Packets Received Rate |
The number of packets received per second that is flagged by the kernel as errored. rate_rx_errs |
Errored Packets Transmitted |
The total number of transmit problems. tx_errs |
Errored Packets Transmitted Rate |
The total number of transmit problems per second. rate_tx_errs |
Interface Drops during Transmission (%) |
Interface Drops during Transmission (%) per_tx_drop |
Interface Drops while Receiving (%) |
Interface Drops while Receiving (%) per_rx_drop |
Interface Errors during Transmission (%) |
Interface Errors during Transmission (%) per_tx_errs |
Interface Errors while Receiving (%) |
Interface Errors while Receiving (%) per_rx_errs |
Limit Exceeded Rate (PPS) - AWS Only |
The number of packets that exceed the maximum for the instance type that are processed (bidirectionally) by the Aviatrix gateway per second. rate_pps_limit_exceeded |
Linklocal Limit Exceeded |
Linklocal Limit Exceeded linklocal_limit_exceeded |
Linklocal Limit Exceeded (%) |
Linklocal Limit Exceeded (%) per_linklocal_limit_exceeded |
Linklocal Limit Exceeded Rate |
Linklocal Limit Exceeded Rate rate_linklocal_limit_exceeded |
Multicast Packets Received |
Multicast Packets Received rx_multicast |
Multicast Packets Received Rate |
The number of multicast packets per second. rate_rx_multicast |
PPS Limit Exceeded |
The count of bidirectional packets that exceed the maximum for the instance type and are handled by the Aviatrix gateway. pps_limit_exceeded |
PPS Limit Exceeded Drop (%) |
PPS Limit Exceeded Drop (%) per_pps_limit_exceeded |
Packet Drop (%) |
Packet Drop (%) per_pkt_drop |
Packet Drop Rate |
The rate at which packets are dropped per second. rate_pkt_drop (also pkt_drop_rate) |
Packet Failure (%) |
Packet Failure (%) per_pkt_fail |
Packet Failure Rate |
Packet Failure Rate rate_pkt_fail |
Packets Dropped during Transmission |
The count of packets that were dropped during transmission, often due to resource constraints. tx_drop |
Packets Dropped while Receiving |
The count of received packets that were not processed, typically due to resource limitations or unsupported protocols. rx_drop |
Peak Received Rate |
Peak Received Rate rate_peak_received |
Peak Total Rate |
Peak Total Rate rate_peak_total |
Peak Transmitted Rate |
Peak Transmitted Rate rate_peak_sent |
Received Bytes |
Received Bytes rx_bytes |
Received Frames Rate |
Received Frames Rate rate_rx_frame |
Received Packets |
Received Packets rx_packets |
Received Rate |
Received Rate rate_received |
Received Rate (PPS) |
Packets Received Rate — The total (received) transmission in packet level per second. pkt_rx_rate |
Receiver FIFO Frames |
Receiver FIFO Frames rx_fifo |
Receiver FIFO Frames Rate |
The number of overflow events per second when receiving packets. rate_rx_fifo |
Received Frames |
Received Frames rx_frame |
Total Attempted Rate |
Total Attempted Rate rate_pkt_attempted |
Total Rate |
The total (bidirectional) rate of bits processed per second by the interface on the Aviatrix VM/instance. rate_total |
Total Rate (in packets) |
The total (bidirectional) transmission in packet level per second. Instance size impacts how many packets per second the gateway can handle. pkt_rate_total |
Transmission FIFO Frames Rate |
The number of frame transmission errors per second due to device FIFO underrun/underflow. rate_tx_fifo |
Transmission FIFO Frames |
The number of frame transmission errors due to device FIFO underrun/underflow. tx_fifo |
Transmitted Bytes |
Transmitted Bytes tx_bytes |
Transmitted Carrier Frames |
Transmitted Carrier Frames tx_carrier |
Transmitted Carrier Frames Rate |
Transmitted Carrier Frames Rate rate_tx_carrier |
Transmitted Packets |
Transmitted Packets tx_packets |
Transmitted Rate |
The rate of bits per second that has been transmitted by the interface on the Aviatrix gateway VM/instance. rate_sent |
Transmitted Rate (PPS) |
Transmitted Rate (PPS) pkt_tx_rate |