TSO, GRO, RSS, and Blacklist Feature on Avi Vantage

Overview

This guide explains the TSO, GRO, RSS and blacklist features on Avi Vantage.

TCP Segmentation Offload (TSO)

TCP segmentation offload is used to reduce the CPU overhead of TCP/IP on fast networks. A host with TSO-enabled hardware sends TCP data to the NIC (network interface card) without segmenting the data in software. This type of offload relies on the NIC to segment the data and then add the TCP, IP, and data link layer headers to each segment. When an Avi Service Engine (SE) is running in DPDK mode, TSO could be enabled on the following NICs:

  • ixgbe, vmxnet3, i40e (starting with Avi Vantage 17.2.8)
  • Mellanox connectX-4 (starting with Avi Vantage 17.2.12)
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)

Generic Receive Offload (GRO)

Generic Receive Offload (GRO) is a software technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single flow into a larger packet chain before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed.

Note: The benefits of GRO are only seen if multiple packets for the same flow are received in a short span of time. If the incoming packets belong to different flows, then the benefits of having GRO enabled might not be seen. The following are the interfaces on which GRO is supported by Avi Vantage in DPDK mode:

  • ixgbe, i40e, virtio, vmxnet3 (starting with Avi Vantage 17.2.8)
  • Mellanox connectX-4 (starting with Avi Vantage 17.2.12)
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)

Enabling GRO and TSO on an Avi SE

By default, TSO and GRO features are disabled on an Avi SE.

Notes:

  • Starting with Avi Vantage release 18.2.5, the TSO feature is enabled by default.
  • Enabling TSO/GRO is non-disruptive and it does not require an SE restart.

Login to the Avi CLI and use the configure serviceenginegroup command to enable TSO and GRO features.


[admin:cntrl]: > configure serviceenginegroup Default-Group

Updating an existing object. Currently, the object is:

| disable_gro                           | True    |

| disable_tso                           | True    |


[admin:cntrl]: serviceenginegroup> no disable_gro

Overwriting the previously entered value for disable_gro

[admin:cntrl]: serviceenginegroup> no disable_tso

Overwriting the previously entered value for disable_tso

[admin:cntrl]: serviceenginegroup> save

| disable_gro                           | False    |

| disable_tso                           | False    |

To verify if the features have been correctly turned ON in the SE, you can check the following statistics on the Avi Controller CLI.

GRO statistics are part of interface statistics. For GRO, check the statistics for the following parameters:

  • gro_mbufs_coalesced
  • gro_mbufs_coalesced

TSO statistics are part of mbuf statistics. For TSO, check the statistics for the following parameters:

  • num_tso_bytes
  • num_tso_chains

Execute the show serviceengine <interface IP address> interface command and filter the output using grep command shown as follows:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface  | grep gro

|       gro_mbufs_coalesced          | 1157967  |

|     gro_mbufs_coalesced            | 1157967  |

Note: The sample output mentioned above is for 1-queue (No RSS).

Refer to the output mentioned below for RSS-enabled, a 4-queue RSS.

Note: In case of a port-channel interface, provide the relevant physical interface name as the filter in the intfname option. For reference, refer to the output mentioned below for the Ethernet 4 interface.


show serviceengine 10.1.1.1 interface filter intfname eth4 | grep gro

|       gro_mbufs_coalesced          | 320370   |

|       gro_mbufs_coalesced          | 283307    |

|       gro_mbufs_coalesced          | 343143    |

|       gro_mbufs_coalesced          | 217442    |

|     gro_mbufs_coalesced            | 1164262   |

Note: The statistics for a NIC is the sum of the statistics for each queue for the specific interface.


[admin:cntrl]: > show serviceengine 10.1.1.1 mbufstats | grep tso

| num_tso_bytes                    | 4262518516                          |

| num_tso_chains                   | 959426                              |

If the features are enabled, the statistics in the output mentioned above will reflect non-zero values for TSO parameters.

Multi-queue Support

Dispatcher on Avi Vantage is responsible for fetching the incoming packets from a NIC, sending them to the appropriate core for proxy work and sending back the outgoing packets to the NIC. A 40G NIC or even a 10G NIC receiving traffic at a high packet per second or PPS (for instance, in case of small UDP packets) might not be efficiently processed by a single-core dispatcher. This problem can be solved by distributing traffic from a single physical NIC across multiple queues where each queue gets processed by a dispatcher on a different core. Receive Side Scaling (RSS) enables the use of multiple queues on a single physical NIC.

The rest of this section is structured as follows :

  • The RSS feature and how to enable it on Avi Vantage.
  • The configurable dispatchers allows you to configure the number of dispatchers, thus effectively setting the number of receive and transmit queues.

Enabling Multi-Queue Property for SE Image in OpenStack Cloud

You can configure multi-queue in OpenStack. You need to enable hw_vif_multiqueue_enable flag in OpenStack cloud configuration. For more details on configuring multi-queue in OenStack refer to OpenStack Cloud Advanced Configuration Options guide.

Receive Side Scaling (RSS)

When RSS is enabled on Avi Vantage, NICs make use of multiple queues in the receive path. The NIC pins flow to queues, and put packets belonging to the same flow to be used in the same queue. This helps the driver to spread packet processing across multiple CPUs thereby improving the efficiency. On an Avi SE, the multi-queue feature is also enabled on the transmit side, i.e., different flows are pinned to different queues (packets belonging to the same flow in the same queue) to distribute the packet processing among CPUs. Avi Vantage supports multi-queue feature on the following Ethernet adapters :

  • Intel: 82599, X520, X540, X550, X552, XL710, X710
  • Mellanox: ConnectX-4
  • Broadcom BCM574XX (validated on BCM57414) family (starting with Avi Vantage 18.2.8)

Note: The multi-queue feature (RSS) is not supported along with IPv6 addresses. If RSS is enabled, then IPv6 address can not be configured on any of the above supported interfaces. Similarly, if IPv6 address is already configured on any of the supported interfaces mentioned above, the multi-queue feature (RSS) can not enabled on those interfaces.

Enabling RSS on an Avi SE

The distribute_queues knob, in the SE-group properties enable and disable RSS. Login to the Avi CLI, and use distribute_queues command to enable the RSS feature.

Note: Any change in the distribute_queues parameters requires an SE restart.


| distribute_queues | False  |

[admin:cntrl]: serviceenginegroup> distribute_queues

Overwriting the previously entered value for distribute_queues

[admin:cntrl]: serviceenginegroup> save

| distribute_queues | True   |

When RSS is turned ON, all the NICs in the SE configure and use an optimum number of queue pairs as calculated by the SE. The calculation of this optimum number is described in the section on configurable dispatchers.

For instance, the output of a 4-queue RSS-supported interface will be as follows:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep ifq

|     ifq_stats[1]                   |

|     ifq_stats[2]                   |

|     ifq_stats[3]                   |

|     ifq_stats[4]                   |

For a 4-queue RSS, the output would look like as shown above.

The value of counters for ipackets (input packets)and opackets (output packets) per interface queue will be a non-zero value as shown below:


[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep pack

|     ipackets                       | 40424864                            |

|     opackets                       | 42002516                            |

|       ipackets                     | 10108559                            |

|       opackets                     | 11017612                            |

|       ipackets                     | 10191660                            |

|       opackets                     | 10503881                            |

|       ipackets                     | 9873611                             |

|       opackets                     | 10272103                            |

|       ipackets                     | 10251034                            |

|       opackets                     | 10208920                            |

Statistics of each queue and one combined statistics overall for the NIC.

Setting Dispatchers on Service Engines

With the configurable-dispatcher feature, you can configure the number of dispatchers that can be used in the Service Engine.

Note: Starting with Avi Vantage version 18.2.8, the distribute_queues parameter used to enable RSS mode of operation for multiple dispatchers has been deprecated. Refer to the section on Setting Multiple Queues per Dispatcher for relevant command in 18.2.8 and later.

The number of dispatcher cores that you can configure is limited to only powers of two with a maximum of 16 dispatcher cores. In other words, you can configure only the values from the set [0,1,2,4,8 or 16]. If the value is set to 0 (i.e., the default value), an optimum number of dispatcher cores is deduced automatically. The limitation to configure values of the form 2^n comes from some of the NICs that allow the number of RSS queues to be only powers of 2.

Refer to the mlx4 PMD Known Issues section of Mellanox DPDK Release Notes, for details on the number of configured RSS queues must be power of 2.

Use the num_dispatcher_cores command in SE-group properties to configure the number of dispatcher cores. By default, the num_dispatcher_cores is set to 0.
Login to the Avi CLI, and set the value of num_dispatcher_cores to the desired value. When RSS is enabled, this effectively sets the number of RSS queues, and hence the number of dispatchers is also set to the configured value.

Notes:

  • Any change in the num_dispatcher_cores parameter requires a restart of SE to get the configuration into effect.

  • The num_dispatcher_cores can be set to zero only for LSC in DPDK mode. In non-DPDK mode and for DPDK in other cloud setup, the value has to be set explicitly.

Configuration Samples

The example mentioned below exhibits the configuration on a bare-metal machine with 24 vCPUs, 2 10G NICs, and 1 bonded-if of 2 10G NICs, and distribute_queues enabled.

  • Set the value of the configure num_dispatcher_cores parameter is set to 8.

 [admin:cntrl]: serviceenginegroup> num_dispatcher_cores 8
 Overwriting the previously entered value for num_dispatcher_cores
 [admin-ctrlr]: serviceenginegroup> save 

  [admin:cntrl]:> show serviceengine 10.1.1.1 seagent | grep -E "dispatcher|queues"
  |num_dispatcher_cpu                   | 8
  |num_queues                           | 8 
  • Set the value of the configure num_dispatcher_cores parameter is set to 0 (the default value).
    After restarting the SE, though the configured value for dispatchers is set to 0, the number of queues, and hence the number of dispatchers is changed to 4 as shown below.

 [admin:cntrl]:> show serviceengine 10.1.1.1 seagent | grep -E "dispatcher|queues"
  |num_dispatcher_cpu                   | 4
  |num_queues                           | 4 

To further optimize the system performance, the Avi Controller’s configuration is overridden in the following two scenarios:

  • For a bare-metal machine with the number of vCPUs greater than 4, the dedicated dispatcher is turned ON automatically.
  • For a system with the sufficient number of cores and having only 10G interfaces, if the number of dispatcher cores configured is 0, the RSS is turned OFF even though you have turned it ON.

Note: A single dispatcher core is capable of processing an I/O of 10Gbps. This combined with other parameters like the total NIC capacity and the number of cores available is used for the automatic calculation of the optimum number of dispatchers.

Setting Multiple Queues per Dispatcher

Starting version 18.2.8, Avi vantage supports configuring more than one queue per dispatcher. This feature introduces the ability to utilise more than one queue per dispatcher in DPDK mode of operation and is applicable for environments with shallow interface ring sizes.

The following table shows the support matrix for this feature on different ecosystems:

Ecosystem Availability
OpenStack Yes
Linux Server Cloud Yes
KVM Yes
AWS Tech Preview

The max_queues_per_vnic parameter in SE-group properties allows configuring the maximum number of queues per dispatcher.

The following are the supported values for max_queues_per_vnic:

  • One (Reserved) - One Queue per NIC (Default)

  • Integer value

  • Zero (Reserved) - Auto (deduces optimal number of queues per dispatcher based on the NIC and operating environment.)

By setting the max_queues_per_vnic value to 0:

  • Environments like baremetal can retain the same behaviour where the number of queues is the same as the number of dispatchers.
  • Environments like OpenStack, AWS the number of queues can be more than the dispatchers.

Note: With the Avi Vantage 18.2.8 migration routine, max_queues_per_vnic will be equal to num_dispatcher_cores if num_dispatcher_cores is greater 0.

The following are the two modes of operation:

  • Compact Mode — Single dispatcher manages all the vNIC queues (OpenStack and KVM).
  • Distributed Mode — Multiple dispatchers manages a subset of vNIC queues. (All other supported environments).

To configure max_queues_per_vnic, use the following command:


[admin:admin-controller-1]: serviceenginegroup> max_queues_per_vnic

INTEGER 0,1,2,4,8,16    Maximum number of queues per vnic Setting to '0' utilises all queues that are distributed across dispatcher cores.

[admin:admin-controller-1]: > configure serviceenginegroup Default-Group
Updating an existing object. Currently, the object is:
+-----------------------------------------+---------------------------------------------------------+
| Field                                   | Value                                                   |
+-----------------------------------------+---------------------------------------------------------+
[output truncated]
| se_rum_sampling_nav_percent             | 1                                                       |
| se_rum_sampling_res_percent             | 100                                                     |
| se_rum_sampling_nav_interval            | 1 sec                                                   |
| se_rum_sampling_res_interval            | 2 sec                                                   |
| se_kni_burst_factor                     | 2                                                       |
| max_queues_per_vnic                     | 1                                                       |
| core_shm_app_learning                   | False                                                   |
| core_shm_app_cache                      | False                                                   |
| pcap_tx_mode                            | PCAP_TX_AUTO                                            |
+-----------------------------------------+---------------------------------------------------------+
[admin:admin-controller-1]: serviceenginegroup> max_queues_per_vnic 2
Overwriting the previously entered value for max_queues_per_vnic
[admin:admin-controller-1]: serviceenginegroup> save

The show serviceegine [se] seagent displays the number of queues per dispatcher and total number of queues per interface.

 show serviceengine [se] seagent

| num_dp_heartbeat_miss                | 0                                      |
| se_registration_count                | 2                                      |
| se_registration_fail_count           | 0                                      |
| num_dispatcher_cpu                   | 1                                      |
| ------------------ truncated output--------------------|
| num_flow_cpu                         | 1                                      |
| num_queues                           | 1                                      |
| num_queues_per_dispatcher            | 1                                      |
+---------------------------------------+-----------------+

RSS Scale Out

  • Both Layer 2 and Layer 3 scale out are supported.
  • No asymmetric combinations are supported.
    • RSS enabled on one SE and disabled on the other SE is not supported.
    • In a pre-existing scale-out setup, any configuration changes which changes the RSS state in either of the SE’s is not supported. For instance, change of the RSS-supported interface or the distribute_queues parameter (as discussed in the previous section) is not supported.
TCP/UDP Virtual Service Profile Auto Gateway RSS scale out Notes
TCP Yes NA
TCP per packet Yes Inefficient for Layer 2 scale out since all packets coming to the secondary SE are handled by one dispatcher core. For efficiency, disable auto gateway in the virtual service configuration from the Avi user interface.
UDP fast path Yes/No Layer 2 scale out is not supported. All incoming packets are handled by the primary SE.

Blacklisting Feature

In Linux server cloud environment, if NICs have to be blacklisted (left unclaimed by SE/DPDK), specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/blacklist file in the host (outside the SE container). Update the file before logging in to the Avi SE. If you are already logged in to the SE, then restart the SE for the configuration to take effect.

To blacklist a NIC in a Linux server cloud, specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/blacklist file in the host (outside the SE container).Update the file before logging in to the Avi SE. If you are already logged in to the SE, then restart the SE for the configuration to take effect.

FAQ on Blacklisting

Q. How to find out the BDF of a NIC?

If the Ethernet 9 interface has to be blacklisted, specify the string that is against bus-info in the output of ethtool -i eth9 in /etc/blacklist.

If /etc/blacklist is not present, it has to be created. An example is shown below.


root@10.1.1.1:~# ethtool -i eth9
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1c:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10-1-1-1:~# cat /etc/blacklist
0000:1c:00.0
root@10-1-1-1:~#

Q. How to blacklist multiple NICs?

Specify BDFs separated by a comma with no spaces in between. An example is shown below.


root@10.1.1.1:~# ethtool -i eth8
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10-1-1-1:~# ethtool -i eth9
driver: vmxnet3
version: 1.4.7.0-k-NAPI
firmware-version:
bus-info: 0000:1c:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
root@10.1.1.1:~# cat /etc/blacklist
0000:1c:00.0,0000:1b:00.0
root@10.1.1.1:~#

Q. Will the VLAN interfaces of a blacklisted NIC be claimed by the SE?

No, the VLAN interfaces associated with a blacklisted NIC remain unclaimed.

Q. What is the expected behavior when a blacklisted NIC is part of a port-channel?

Blacklisted NICs do not take part in the port-channels claimed by an SE.

Q. How many NICs per host could be blacklisted?

39 NICs could be blacklisted per host.