Metrics Monitored for Aviatrix Resources

The system metrics and network metrics that you can access in CoPilot are captured by the Aviatrix Controller. The Controller pulls the data from virtual machines (instances/hosts) that Aviatrix Gateways run on and feeds that data to CoPilot.

Some metrics can be used for triggering actions such as alerting and Gateway scaling. You can also use metrics to monitor the performance of Gateway hosts. You can monitor performance in CoPilot from the Monitor > Performance page.

In addition, with the Aviatrix Network Insights API add-on license, you can use APIs to analyze the performance and health of your Aviatrix-managed resources in external monitoring systems. See Monitoring with Network Insights API for information about using Network Insights Metric and Status APIs.

About Metrics that are Monitored for Aviatrix Resources

Aviatrix Controller captures system metric and network metric information about the virtual machines (instances/hosts) that Aviatrix Gateways run on.

Health-type metric information is also captured for Controller and CoPilot virtual machines. See Global Control Plane Health Alert.

Metrics that are monitored by Aviatrix Controller and Aviatrix CoPilot include the following:

On the CoPilot Monitor > Performance page, you can select metrics to monitor performance on your resource VMs.

On the CoPilot Monitor > Notifications > Alerts Configuration page, you can configure how to use the pre-existing set of metrics to send notifications about events that occur in your network, such as performance bottlenecks or other problems.

To better understand how notifications and alerts work and how to configure them in CoPilot, see documentation:monitoring-troubleshooting:user-alerts-network.adoc.

For more information about integrating Aviatrix metric and status APIs with external monitoring tools, see Monitoring with Network Insights API.

Health Metrics for Triggering Notifications or Other Actions

The following health metrics are available in CoPilot. They are listed in alphabetical order, by the name used in the CoPilot UI.

Name (Health Metric) Description Internal Metric Name Accessible by API

BGP Peering Status

Any BGP peering status change triggers an alert.

BGPpeeringStatus

Connection Status

Any connection status change on the specified gateways/connections triggers an alert.

ConnectionStatus

Gateway Status

Any gateway status change triggers an alert.

GatewayStatus

Underlay Connection Status

Monitors the syslog from any connection that includes the host as the source or destination. When syslog data indicates a potential problem from each direction of the connection between that host and another host within 30 seconds of the other, the alert is triggered. On the same connection, if the syslog data later indicates the problem is resolved from either direction, the alert is automatically resolved.

UnderlayConnectionStatus

System Metrics for Triggering Notifications or Other Actions

For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following system metrics. Aviatrix gateways report live Linux system statistics (such as memory, CPU, I/O, processes, and swap) for the instances/virtual machines on which they run. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.

Name (System Metric) Description Internal Metric Name Accessible by API

CPU Idle (%)

Of the total CPU time, the percentage of time the CPU(s) spent idle and waiting for tasks from the kernel.

cpu_idle

20

CPU Kernel Space (%)

Of the total kernel space memory on the host (VM/instance), the percentage of time spent running kernel code.

cpu_ks

20

CPU Steal (%)

Of the average CPU wait time on the host (VM/instance), the percentage of time a virtual CPU waits for a real CPU while the hypervisor services another virtual processor.

cpu_steal

CPU Used (%)

The percentage of CPU used.

cpu_used_per

CPU User Space (%)

Of the total CPU time, the percentage of time spent running non-kernel code.

cpu_us

20

CPU Wait (%)

Of the total CPU time, the percentage of time spent waiting for IO.

cpu_wait

20

Disk Free

The storage space on the disk (volume) that is free/unused.

hdisk_free

Disk Free (%)

Of the total storage space on the disk (volume), the percentage of storage space that is free/unused.

hdisk_free_per

Disk Total

The total storage space on the disk (volume).

hdisk_tot

IO Blocks In

The number of blocks received per second from a block device.

io_blk_in

IO Blocks Out

The number of blocks sent per second to a block device.

io_blk_out

Memory Available

The available amount of memory that can be allocated to new or existing processes.

memory_available

20

Memory Available (%)

Of the total memory, the percentage of the available memory that can be allocated to new or existing processes.

memory_available_per

Memory Buffer

The amount of memory used as buffers.

memory_buf

20

Memory Cache

The amount of memory used as cache.

memory_cached

20

Memory Swapped

If swapped is enabled, the amount of virtual memory used.

memory_swpd

20

Memory Total

The total memory.

memory_tot

Memory Used

The amount of memory used.

memory_used

Memory Used (%)

Of the total memory, the percentage of memory used.

memory_used_per

Processes Uninterruptible Sleep

The number of processes blocked waiting for I/O to complete.

nproc_non_int_sleep

Processes Waiting To Be Run

The number of processes that are running or waiting for run time.

nproc_running

Swaps From Disk

Memory that is swapped in every second from disk in kilobytes.

swap_from_disk

Swaps To Disk

Memory that is swapped out every second to disk in kilobytes.

swap_to_disk

System Context Switches

The number of context switches per second.

system_cs

System Interrupts

The number of interrupts per second, including the clock.

system_int

Network Metrics for Triggering Notifications or Other Actions

For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following network metrics. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.

Name (Network Metric) Description Internal Metric Name Accessible by API

Bandwidth Egress Limit Exceeded

Bandwidth Egress Limit Exceeded

bandwidth_egress_limit_exceeded

Bandwidth Egress Limit Exceeded (%)

Bandwidth Egress Limit Exceeded (%)

per_bandwidth_egress_limit

Bandwidth Egress Limit Exceeded Rate

The number of tx packets dropped because the bandwidth allowance limit was exceeded.

This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS.

rate_bandwidth_egress_limit_exceeded

Bandwidth Ingress Limit Exceeded

Bandwidth Ingress Limit Exceeded

bandwidth_ingress_limit_exceeded

Bandwidth Ingress Limit Exceeded (%)

The percentage of dropped rx packets due to exceeding the bandwidth allowance limit. This metric is specific to the ENA driver on AWS.

per_bandwidth_ingress_limit_exceeded

20

Bandwidth Ingress Limit Exceeded Rate

(AWS Only) Bandwidth Ingress Limit Exceeded Rate — The number of rx packets dropped because the bandwidth allowance limit was exceeded.

This metric is supplied by the ENA driver only on AWS.

rate_bandwidth_ingress_limit_exceeded

Collisions during Transmission

The count of collisions during packet transmission.

tx_colls

Collisions Rate during Transmission

The number of collisions per second during packet transmission.

rate_tx_colls

Compressed Packets Received

The count of compressed packets received.

rx_compressed

Compressed Packets Received Rate

The number of compressed packets received per second.

rate_rx_compressed

Compressed Packets Transmitted

The count of correctly received compressed packets.

tx_compressed

Compressed Packets Transmitted Rate

The number of correctly received compressed packets per second.

rate_tx_compressed

Conntrack Allowance Available

(AWS Only) Reports the number of available tracked connections that can be established before an instance’s Connections Tracked allowance is exceeded. This metric is supplied by the Elastic Network Adapter (ENA) driver only on AWS.

conntrack_allowance_available

Conntrack Limit Exceeded

Conntrack Limit Exceeded

conntrack_limit_exceeded

Conntrack Limit Exceeded (%)

Conntrack Limit Exceeded (%)

per_conntrack_limit_exceeded

Conntrack Limit Exceeded Rate

Conntrack limit exceeded rate.

rate_conntrack_limit_exceeded

Conntrack Usage Rate

(AWS Only) The rate at which conntrack capacity is being used up in connections per second. The Conntrack Usage Rate metric is only available in AWS where the Conntrack Allowance Available (conntrack_allowance_available) metric is present.

conntrack_usage_rate

Drop Rate during Transmission

The number of packets being dropped per second while sending.

rate_tx_drop

20

Drop Rate while Receiving

The number of packets being dropped per second while receiving.

rate_rx_drop

20

Errored Packets Received

The count of packets received that is flagged by the kernel as errored.

rx_errs

Errored Packets Received Rate

The number of packets received per second that is flagged by the kernel as errored.

rate_rx_errs

Errored Packets Transmitted

The total number of transmit problems.

tx_errs

Errored Packets Transmitted Rate

The total number of transmit problems per second.

rate_tx_errs

Interface Drops during Transmission (%)

Interface Drops during Transmission (%)

per_tx_drop

Interface Drops while Receiving (%)

Interface Drops while Receiving (%)

per_rx_drop

Interface Errors during Transmission (%)

Interface Errors during Transmission (%)

per_tx_errs

Interface Errors while Receiving (%)

Interface Errors while Receiving (%)

per_rx_errs

Limit Exceeded Rate (PPS) - AWS Only

The number of packets that exceed the maximum for the instance type that are processed (bidirectionally) by the Aviatrix gateway per second.

rate_pps_limit_exceeded

Linklocal Limit Exceeded

Linklocal Limit Exceeded

linklocal_limit_exceeded

Linklocal Limit Exceeded (%)

Linklocal Limit Exceeded (%)

per_linklocal_limit_exceeded

Linklocal Limit Exceeded Rate

Linklocal Limit Exceeded Rate

rate_linklocal_limit_exceeded

Multicast Packets Received

Multicast Packets Received

rx_multicast

Multicast Packets Received Rate

The number of multicast packets per second.

rate_rx_multicast

PPS Limit Exceeded

The count of bidirectional packets that exceed the maximum for the instance type and are handled by the Aviatrix gateway.

pps_limit_exceeded

20

PPS Limit Exceeded Drop (%)

PPS Limit Exceeded Drop (%)

per_pps_limit_exceeded

Packet Drop (%)

Packet Drop (%)

per_pkt_drop

Packet Drop Rate

The rate at which packets are dropped per second.

rate_pkt_drop

20

Packet Failure (%)

Packet Failure (%)

per_pkt_fail

Packet Failure Rate

Packet Failure Rate

rate_pkt_fail

Packets Dropped during Transmission

The count of packets that were dropped during transmission, often due to resource constraints.

tx_drop

20

Packets Dropped while Receiving

The count of received packets that were not processed, typically due to resource limitations or unsupported protocols.

rx_drop

20

Peak Received Rate

Peak Received Rate

rate_peak_received

Peak Total Rate

Peak Total Rate

rate_peak_total

Peak Transmitted Rate

Peak Transmitted Rate

rate_peak_sent

Received Bytes

Received Bytes

rx_bytes

Received Frames Rate

Received Frame Rate

The number of frame alignment errors per second when receiving packets. On AWS, this may occur due to RX buffer overruns on physical interfaces, which can result in packet drops by the NIC.

rate_rx_frame

Received Packets

Received Packets

rx_packets

Received Rate

Received Rate

rate_received

20

Received Rate (PPS)

Packets Received Rate — The total (received) transmission in packet level per second.

pkt_rx_rate

Receiver FIFO Frames

Receiver FIFO Frames

rx_fifo

Receiver FIFO Frames Rate

The number of overflow events per second when receiving packets.

rate_rx_fifo

Received Frames

Received Frames

rx_frame

Total Attempted Rate

Total Attempted Rate

rate_pkt_attempted

Total Rate

The total (bidirectional) rate of bits processed per second by the interface on the Aviatrix VM/instance.

rate_total

20

Total Rate (in packets)

The total (bidirectional) transmission in packet level per second. Instance size impacts how many packets per second the gateway can handle.

pkt_rate_total

Transmission FIFO Frames Rate

The number of frame transmission errors per second due to device FIFO underrun/underflow.

rate_tx_fifo

Transmission FIFO Frames

The number of frame transmission errors due to device FIFO underrun/underflow.

tx_fifo

Transmitted Bytes

Transmitted Bytes

tx_bytes

Transmitted Carrier Frames

Transmitted Carrier Frames

tx_carrier

Transmitted Carrier Frames Rate

Transmitted Carrier Frames Rate

rate_tx_carrier

Transmitted Packets

Transmitted Packets

tx_packets

Transmitted Rate

The rate of bits per second that has been transmitted by the interface on the Aviatrix gateway VM/instance.

rate_sent

20

Transmitted Rate (PPS)

Transmitted Rate (PPS)

pkt_tx_rate