TSO, GRO, RSS, and Blacklist Feature on Avi Vantage
TCP Segmentation Offload (TSO)
TCP segmentation offload is used to reduce the CPU overhead of TCP/IP on fast networks. A host with TSO-enabled hardware sends TCP data to the NIC (network interface card) without segmenting the data in software. This type of offload relies on the NIC to segment the data and then add the TCP, IP, and data link layer headers to each segment. When an Avi Service Engine (SE) is running in DPDK mode, TSO could be enabled on the following NICs:
- ixgbe, vmxnet3, i40e (starting with Avi Vantage 17.2.8)
- Mellanox connectX-4 (starting with Avi Vantage 17.2.12)
Generic Receive Offload (GRO)
Generic Receive Offload (GRO) is a software technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single flow into a larger packet chain before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed. Note: The benefits of GRO are only seen if multiple packets for the same flow are received in a short span of time. If the incoming packets belong to different flows, then the benefits of having GRO enabled might not be seen. The following are the interfaces on which GRO is supported by Avi Vantage in DPDK mode:
- ixgbe, i40e, virtio, vmxnet3 (starting with Avi Vantage 17.2.8)
- Mellanox connectX-4 (starting with Avi Vantage 17.2.12)
Enabling GRO and TSO on an Avi SE
By default, TSO and GRO features are disabled on an Avi SE.
Note: Starting with Avi Vantage release 18.2.5, the TSO feature is enabled by default.
Login to the Avi CLI and use the configure serviceenginegroup command to enable TSO and GRO features.
[admin:cntrl]: > configure serviceenginegroup Default-Group Updating an existing object. Currently, the object is: | disable_gro | True | | disable_tso | True | [admin:cntrl]: serviceenginegroup> no disable_gro Overwriting the previously entered value for disable_gro [admin:cntrl]: serviceenginegroup> no disable_tso Overwriting the previously entered value for disable_tso [admin:cntrl]: serviceenginegroup> save | disable_gro | False | | disable_tso | False |
To verify if the features have been correctly turned ON in the SE, you can check the following statistics on the Avi Controller CLI.
GRO statistics are part of interface statistics. For GRO, check the statistics for the following parameters:
TSO statistics are part of mbuf statistics. For TSO, check the statistics for the following parameters:
show serviceengine <interface IP address> interface command and filter the output using
grep command as shown below.
[admin:cntrl]: > show serviceengine 10.1.1.1 interface | grep gro | gro_mbufs_coalesced | 1157967 | | gro_mbufs_coalesced | 1157967 |
Note: The sample output mentioned above is for 1-queue (No RSS).
Refer to the output mentioned below for RSS-enabled, a 4-queue RSS.
Note: In case of a port-channel interface, provide the relevant physical interface name as the filter in the
intfname option. For reference, refer to the output mentioned below for the Ethernet 4 interface.
show serviceengine 10.1.1.1 interface filter intfname eth4 | grep gro | gro_mbufs_coalesced | 320370 | | gro_mbufs_coalesced | 283307 | | gro_mbufs_coalesced | 343143 | | gro_mbufs_coalesced | 217442 | | gro_mbufs_coalesced | 1164262 |
Note: The statistics for a NIC is the sum of the statistics for each queue for the specific interface.
[admin:cntrl]: > show serviceengine 10.1.1.1 mbufstats | grep tso | num_tso_bytes | 4262518516 | | num_tso_chains | 959426 |
If the features are enabled, the statistics in the output mentioned above will reflect non-zero values for TSO parameters.
Dispatcher on Avi Vantage is responsible for fetching the incoming packets from a NIC, sending them to the appropriate core for proxy work and sending back the outgoing packets to the NIC. A 40G NIC or even a 10G NIC receiving traffic at a high packet per second or PPS (e.g., in case of small UDP packets) might not be efficiently processed by a single-core dispatcher. This problem can be solved by distributing traffic from a single physical NIC across multiple queues where each queue gets processed by a dispatcher on a different core. Receive Side Scaling (RSS) enables the use of multiple queues on a single physical NIC. The rest of this section is structured as follows :
- The RSS feature and how to enable it on Avi Vantage.
- The configurable dispatchers where the user is allowed to configure the number of dispatchers, thus effectively setting the number of receive and transmit queues.
Receive Side Scaling (RSS)
When RSS is enabled on Avi Vantage, NICs make use of multiple queues in the receive path. The NIC pins flow to queues, and put packets belonging to the same flow to be used in the same queue. This helps the driver to spread packet processing across multiple CPUs thereby improving the efficiency. On an Avi SE, the multi-queue feature is also enabled on the transmit side, i.e., different flows are pinned to different queues (packets belonging to the same flow in the same queue) to distribute the packet processing among CPUs. Avi Vantage supports multi-queue feature on the following Ethernet adapters :
- Intel : 82599, X520, X540, X550, X552, XL710, X710
- Mellanox : ConnectX-4
Note: The multi-queue feature (RSS) is not supported along with IPv6 addresses. If RSS is enabled, then IPv6 address can not be configured on any of the above supported interfaces. Similarly, if IPv6 address is already configured on any of the supported interfaces mentioned above, the multi-queue feature (RSS) can not enabled on those interfaces.
Enabling RSS on an Avi SE
The distribute_queues knob, in the
SE-group properties enable and disable RSS. Login to the Avi CLI, and use the distribute_queues command to enable the RSS feature.
Note: Any change in the distribute_queues parameters requires an SE restart.
| distribute_queues | False | [admin:cntrl]: serviceenginegroup> distribute_queues Overwriting the previously entered value for distribute_queues [admin:cntrl]: serviceenginegroup> save | distribute_queues | True |
When RSS is turned ON, all the NICs in the SE configure and use an optimum number of queue pairs as calculated by the SE. The calculation of this optimum number is described in the section on configurable dispatchers.
For example, the output of a 4-queue RSS-supported interface will look like as shown below.
[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep ifq | ifq_stats | | ifq_stats | | ifq_stats | | ifq_stats | For a 4-queue RSS, the output would look like as shown above.
The value of counters for ipackets (input packets)and opackets (output packets) per interface queue will be a non-zero value as shown below.
[admin:cntrl]: > show serviceengine 10.1.1.1 interface filter intfname bond1 | grep pack | ipackets | 40424864 | | opackets | 42002516 | | ipackets | 10108559 | | opackets | 11017612 | | ipackets | 10191660 | | opackets | 10503881 | | ipackets | 9873611 | | opackets | 10272103 | | ipackets | 10251034 | | opackets | 10208920 |
Statistics of each queue and one combined statistics overall for the NIC.
When the distribute_queues knob in SE-group properties is enabled, the number of RSS queues, and hence the number of dispatcher cores is deduced automatically. With the configurable-dispatcher feature, the user is also given the control to configure the number of dispatchers that can be used in the Service Engine.
Setting Dispatchers on an AVI SE
The number of dispatcher cores that a user can configure is limited to only powers of two with a maximum of 16 dispatcher cores. In other words, the user can configure only the values from the set [0,1,2,4,8 or 16]. If the value is set to 0 (i.e., the default value), an optimum number of dispatcher cores is deduced automatically. The limitation to configure values of the form 2^n comes from some of the NICs that allow the number of RSS queues to be only powers of 2. Refer to the mlx4 PMD Known Issues section of Mellanox DPDK Release Notes, for the mention of the number of configured RSS queues must be power of 2.
There is a new CLI option, num_dispatcher_cores in SE-group properties. By default, the num_dispatcher_cores is set to 0. Login to the Avi CLI, and set the value of num_dispatcher_cores to the desired value. When RSS is enabled, this effectively sets the number of RSS queues, and hence the number of dispatchers to the configured value.
Note: Any change in the num_dispatcher_cores parameter requires a restart of SE to get the configuration into effect.
The example mentioned below exhibits the configuration on a bare-metal machine with 24 vCPUs, 2 10G NICs, and 1 bonded-if of 2 10G NICs, and distribute_queues enabled.
- Set the value of the configure num_dispatcher_cores parameter is set to 8
[admin:cntrl]: serviceenginegroup> num_dispatcher_cores 8 Overwriting the previously entered value for num_dispatcher_cores [admin-ctrlr]: serviceenginegroup> save
After restarting the SE, the number of dispatcher cores, as well as the RSS queues can be checked as follows :
[admin:cntrl]:> show service enginegroup 10.1.1.1 seagent | grep -E "dispatcher|queues" |num_dispatcher_cpu | 8 |num_queues | 8
- Set the value of the configure num_dispatcher_cores parameter is set to 0 (the default value).
After restarting the SE, though the configured value for dispatchers is set to 0, the number of queues, and hence the number of dispatchers is changed to 4 as shown below.
[admin:cntrl]:> show service enginegroup 10.1.1.1 seagent | grep -E "dispatcher|queues" |num_dispatcher_cpu | 4 |num_queues | 4
To further optimize the system performance, the Avi Controller’s configuration is overridden in the following two scenarios:
- For a bare-metal machine with the number of vCPUs greater than 4, the dedicated dispatcher is turned ON automatically.
- For a system with the sufficient number of cores and having only 10G interfaces, if the number of dispatcher cores configured is 0, the RSS is turned OFF even though the user has turned it ON.
Note: A single dispatcher core is capable of processing an I/O of 10Gbps. This combined with other parameters like the total NIC capacity and the number of cores available is used for the automatic calculation of the optimum number of dispatchers.
RSS Scale Out
- Both Layer 2 and Layer 3 scale out are supported.
- No asymmetric combinations are supported.
- RSS enabled on one SE and disabled on the other SE is not supported.
- In a pre-existing scale-out setup, any configuration changes which changes the RSS state in either of the SE’s is not supported. For example, change of the RSS-supported interface or the distribute_queues parameter (as discussed in the previous section) is not supported.
|TCP/UDP Virtual Service Profile||Auto Gateway||RSS scale out Notes|
|TCP per packet||Yes||Inefficient for Layer 2 scale out since all packets coming to the secondary SE are handled by one dispatcher core. For efficiency, disable auto gateway in the virtual service configuration from the Avi user interface.|
|UDP fast path||Yes/No||Layer 2 scale out is not supported. All incoming packets are handled by the primary SE.|
In Linux server cloud environment, if NICs have to be blacklisted (left unclaimed by SE/DPDK), specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/blacklist file in the host (outside the SE container). This file should be updated before starting the SE. If the SE is already running, SE restart is required for the blacklist configuration to take effect. To blacklist a NIC in a Linux server cloud, specify the PCI BDFs (domain:bus:device.function) of the NICs in the /etc/blacklist file in the host (outside the SE container). This file should be updated before starting the SE. If the SE is already running before the blacklist-related changes, the SE restart is required for the blacklist configuration to take effect.
FAQ on Blacklisting
Q. How to find out the BDF of a NIC?
If the Ethernet 9 interface has to be blacklisted, specify the string that is against bus-info in the output of ethtool -i eth9 in /etc/blacklist. If /etc/blacklist is not present, it has to be created. An example is shown below.
email@example.com:~# ethtool -i eth9 driver: vmxnet3 version: 188.8.131.52-k-NAPI firmware-version: bus-info: 0000:1c:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no root@10-1-1-1:~# cat /etc/blacklist 0000:1c:00.0 root@10-1-1-1:~#
Q. How to blacklist multiple NICs?
Specify BDFs separated by a comma with no spaces in between. An example is shown below.
firstname.lastname@example.org:~# ethtool -i eth8 driver: vmxnet3 version: 184.108.40.206-k-NAPI firmware-version: bus-info: 0000:1b:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no root@10-1-1-1:~# ethtool -i eth9 driver: vmxnet3 version: 220.127.116.11-k-NAPI firmware-version: bus-info: 0000:1c:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no email@example.com:~# cat /etc/blacklist 0000:1c:00.0,0000:1b:00.0 firstname.lastname@example.org:~#
Q. Will the VLAN interfaces of a blacklisted NIC be claimed by the SE?
No, the VLAN interfaces associated with a blacklisted NIC remain unclaimed.
Q. What is the expected behavior when a blacklisted NIC is part of a port-channel?
Blacklisted NICs do not take part in the port-channels claimed by an SE.
Q. How many NICs per host could be blacklisted?
39 NICs could be blacklisted per host.