Internet-Draft | Benchmarking Stateful NATxy Gateways | September 2023 |
Lencse & Shima | Expires 15 March 2024 | [Page] |
RFC 2544 has defined a benchmarking methodology for network interconnect devices. RFC 5180 addressed IPv6 specificities and it also provided a technology update, but excluded IPv6 transition technologies. RFC 8219 addressed IPv6 transition technologies, including stateful NAT64. However, none of them discussed how to apply RFC 4814 pseudorandom port numbers to any stateful NATxy (NAT44, NAT64, NAT66) technologies. We discuss why using pseudorandom port numbers with stateful NATxy gateways is a difficult problem. We recommend a solution limiting the port number ranges and using two test phases (phase 1 and phase 2). We show how the classic performance measurement procedures (e.g. throughput, frame loss rate, latency, etc.) can be carried out. We also define new performance metrics and measurement procedures for maximum connection establishment rate, connection tear down rate and connection tracking table capacity measurements.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 15 March 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
[RFC2544] has defined a comprehensive benchmarking methodology for network interconnect devices, which is still in use. It was mainly IP version independent, but it used IPv4 in its examples. [RFC5180] addressed IPv6 specificities and also added technology updates, but declared IPv6 transition technologies out of its scope. [RFC8219] addressed the IPv6 transition technologies, including stateful NAT64. It has reused several benchmarking procedures from [RFC2544] (e.g. throughput, frame loss rate), it has redefined the latency measurement, and added further ones, e.g. the PDV (packet delay variation) measurement.¶
However, none of them discussed, how to apply [RFC4814] pseudorandom port numbers, when benchmarking stateful NATxy (NAT44, NAT64, NAT66) gateways. We are not aware of any other RFCs that address this question.¶
First, we discuss why using pseudorandom port numbers with stateful NATxy gateways is a hard problem.¶
Then we recommend a solution.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
In its appendix, [RFC2544] has defined a frame format for test frames including specific source and destination port numbers. [RFC4814] recommends to use pseudorandom and uniformly distributed values for both source and destination port numbers. However, stateful NATxy (NAT44, NAT64, NAT66) solutions use the port numbers to identify connections. The usage of pseudorandom port numbers causes different problems depending on the direction.¶
As for the client to server direction, pseudorandom source and destination port numbers could be used, however, this approach would be a denial of service attack against the stateful NATxy gateway, because it would exhaust its connection tracking table capacity. To that end, let us see some calculations using the recommendations of RFC 4814:¶
We note that the usage of different source and destination IP addresses further increases the number of connection tracking table entries.¶
Section 12 of [RFC2544] requires testing first using a single protocol source and destination address pair an then also using multiple protocol addresses. We follow the same approach: first, we use a single source and destination IP address pair, and then we explain how to use multiple IP addresses.¶
Our methodology works with any IP versions to benchmark stateful NATxy gateways, where x and y are in {4, 6}. To facilitate an easy understanding, we use two typical examples: stateful NAT44 and stateful NAT64.¶
The Test Setup for the well-known stateful NAT44 (also called NAPT: Network Address and Port Translation) solution is shown in Figure 1.¶
Note: We are fully aware of [RFC6890] special purpose IP address ranges. The [RFC1918] private IP addresses are used to facilitate an easy understanding of the example. And we consider the usage of the IP addresses reserved for benchmarking absolutely legitimate.¶
The Test Setup for the also widely used stateful NAT64 [RFC6146] solution is shown in Figure 2.¶
As for transport layer protocol, [RFC2544] recommended testing with UDP, and it was kept also in [RFC8219]. For the general recommendation, we also keep UDP, thus the port numbers in the following text are to be understood as UDP port numbers. We discuss the limitation of this approach in Section 8.¶
We define the most important elements of our proposed benchmarking system as follows.¶
We make considerations regarding the necessary and the available IP addresses.¶
In Figure 1, we used the single 198.19.0.1 IPv4 address on the WAN side port of the stateful NAT44 gateway. However, in practice, not a single IP address, but an IP address range is assigned to the WAN side port of the stateful NAT44 gateways. Its required size depends on the number of client nodes and on the type of the stateful NAT44 algorithm. (The traditional algorithm always replaces the source port number, when a new connection is established. Thus it requires a larger range than the extended algorithm, which replaces the source port number only when it is necessary. Please refer to Table 1 and Table 2 of [LEN2015].)¶
When router testing is done, section 12 of [RFC2544] requires testing first using a single source and destination IP address pair, and then using destination IP addresses from 256 different networks. The 16-23 bits of the 198.18.0.0/24 and 198.19.0.0/24 addresses can be used to express the 256 networks. As we do not do router testing, we do not need to use multiple destination networks, therefore, these bits are available for expressing multiple IP addresses that belong to the same "/16" network. Moreover, both the 198.18.0.0/16 and the 198.19.0.0/16 networks can be used on the right side of the test setup as private IP addresses from the 10.0.0.0/16 network are used on its left side.¶
A possible solution for assigning multiple IPv4 addresses is shown in Figure 3. On the left side, the private IP address range is abundantly large. (We used the 16-31 bits to generate nearly 64k potential different source addresses, but the 8-15 bits are also available if needed.) On the right side, the 198.18.0.0./15 network is used, and it was cut it into two equal parts. (Asymmetric division is also possible, if needed.)¶
We note that these are the potential address ranges. We discuss the actual address ranges to be used in Section 4.1.¶
In the case of stateful NAT64, a single "/64" IPv6 prefix contains a high number of bits to express different IPv6 addresses. Figure 4 shows an example, where we used bits 96-111 for that purpose.¶
When a single IP address pair is used for testing then the number of network flows is determined by the number of source port number destination port number combinations.¶
The Initiator SHOULD use restricted ranges for source and destination port numbers to avoid the denial of service attack like event against the connection tracking table of the DUT described in Section 2. If it is possible, the size of the source port number range SHOULD be larger (e.g. in the order of a few times ten thousand), whereas the size of the destination port number range SHOULD be smaller (may vary from a few to several hundreds or thousands as needed). The rationale is that source and destination port numbers that can be observed in the Internet traffic are not symmetrical. Whereas source port numbers may be random, there are a few very popular destination port numbers (e.g. 443, 80, etc., see [IIR2020]) and others hardly occur. And we have found that their role is also asymmetric in the Linux kernel routing hash function [LEN2020].¶
However, in some special cases the size of the source port range is limited. E.g. when benchmarking the CE and BR of a MAP-T [RFC7599] system together (as a compound system performing stateful NAT44), then the source port range is limited to the number of source port numbers assigned to each subscriber. (It could be as low as 2048 ports.)¶
When multiple IP addresses are used, then the port number ranges should be even more restricted, as the number of potential network flows is the product of the size of the source IP address range, the size of the source port number range, the size of the destination IP address range, and the size of the destination port number range. And our method requires the enumeration of all their possible combinations in test phase 1 as described in Section 4.4.¶
The number of network flows can be used as a parameter. The performance of the stateful NATxy gateway MAY be examined as a function of this parameter as described in Section 5.1.¶
Test phase 1 serves two purposes:¶
Whereas the above two things are always necessary before test phase 2, test phase 1 can be used without test phase 2. It is done so, when the maximum connection establishment rate is measured (as described in Section 4.5).¶
Test phase 1 MUST be performed before all tests performed in test phase 2. The following things happen in test phase phase 1:¶
When test phase 1 is performed in preparation to test phase 2, the applied frame rate SHOULD be safely lower than the maximum connection establishment rate. (It implies that maximum connection establishment rate measurement MUST be performed first.) Please refer to Section 4.4 for further conditions regarding timeout and the enumeration of all possible four tuples.¶
We consider the most important Events that may happen during the operation of a stateful NATxy gateway, and the Actions of the gateway as follows.¶
Due to "black box" testing, the Tester is not able to directly examine (or delete) the entries of the connection tracking table. But the entires can be and MUST be controlled by setting an appropriate timeout value and carefully selecting the port numbers of the packets (as described in Section 4.4) to be able to produce meaningful and repeatable measurement results.¶
We aim to support the measurement of the following performance characteristics of a stateful NATxy gateway:¶
It is necessary to control the connection tracking table entries of the DUT in order to achieve clear conditions for the measurements. We can simply achieve the following two extreme situations:¶
From this point we use the following two assumptions:¶
The first extreme situation can be achieved by¶
The second extreme situation can be achieved by¶
[RFC4814] REQUIRES pseudorandom port numbers, which we believe is a good approximation of the distribution of the source port numbers a NATxy gateway on the Internet may face with.¶
We note that although the enumeration of all possible four tuples is not a requirement for the first extreme situation and the usage of different four tuples in test phase 1 is not a requirement for the second extreme situation, pseudorandom enumeration of all possible four tuples in test phase 1 is a good solution in both cases. It may be computing efficiently generated by preparing a random permutation of the previously enumerated all possible four tuples using Dustenfeld's random shuffle algorithm [DUST1964].¶
The enumeration of the four tuples in increasing or decreasing order (or in any other specific order) MAY be used as an additional measurement.¶
The maximum connection establishment rate is an important characteristic of the stateful NATxy gateway and its determination is necessary for the safe execution of test phase 1 (without frame loss) before test phase 2.¶
The measurement procedure of the maximum connection establishment rate is very similar to the throughput measurement procedure defined in [RFC2544].¶
Procedure: The Initiator sends a specific number of test frames using all different four tuples at a specific rate through the DUT. The Responder counts the frames that are successfully translated by the DUT. If the count of offered frames is equal to the count of received frames, the rate of the offered stream is raised and the test is rerun. If fewer frames are received than were transmitted, the rate of the offered stream is reduced and the test is rerun.¶
The maximum connection establishment rate is the fastest rate at which the count of test frames successfully translated by the DUT is equal to the number of test frames sent to it by the Initiator.¶
Note: In practice, we RECOMMEND the usage of binary search.¶
Due to "black box" testing, the entries of the connection tracking table of the DUT may not be directly examined, but the presence of the connections can be checked easily by sending frames from the Responder to the Initiator in test phase 2 using all four tuples stored in the state table of the Tester (at a low enough frame rate). The arrival of all test frames indicates that the connections are really present.¶
Procedure: When all the desired N number of test frames were sent by the Initiator to the Receiver at frame rate R in test phase 1 for the maximum connection establishment rate measurement, and the Receiver has successfully received all the N frames, the establishment of the connections is checked in test phase 2 as follows:¶
Notes:¶
As for the traffic direction, there are three possible cases during test phase 2:¶
If the Initiator sends test frames, then it uses pseudorandom source port numbers and destination port numbers from the restricted port number ranges. (If it uses multiple source and/or destination IP addresses, then their ranges are also limited.) The responder receives the test frames, updates its state table and processes the test frames as required by the given measurement procedure (e.g. only counts them for throughput test, handles timestamps for latency or PDV tests, etc.).¶
If the Responder sends test frames, then it uses the four tuples from its state table. The reading order of the state table may follow different policies (discussed in Section 4.10). The Initiator receives the test frames, and processes them as required by the given measurement procedure.¶
As for the actual measurement procedures, we RECOMMEND to use the updated ones from Section 7 of [RFC8219].¶
Connection tear down can cause significant load for the NATxy gateway. The connection tear down performance can be measured as follows:¶
The connection tear down rate can be computed as:¶
connection tear down rate = N / ( TimestampB - TimestampA)¶
The connection tear down rate SHOULD be measured for various values of N.¶
We assume that the content of the connection tracking table may be deleted by an out-of-band control mechanism specific to the given NATxy gateway implementation. (E.g. by removing the appropriate kernel module under Linux.)¶
We are aware that the performance of removing the entire content of the connection tracking table at one time may be different from removing all the entries one by one.¶
The connection tracking table capacity is an important metric of stateful NATxy gateways. Its measurement is not easy, because an elementary step of a validated maximum connection establishment rate measurement (defined in Section 4.6) may have only a few distinct observable outcomes, but some of them they may have different root causes:¶
During test phase 1, the number of test frames received by the Responder is less than the number of test frames sent by the Initiator. It may have different root causes, including:¶
During test phase 1, the number of test frames received by the Responder equals the number of test frames sent by the Initiator. In this case the connections are validated in test phase 1. The validation may have two kinds of observable results:¶
The number of validation frames received by the Initiator is less than the number of validation frames sent by the Responder. This phenomenon may have various root causes:¶
And here is the problem: as the above three root causes are indistinguishable, it is not easy to decide, whether R or r should be decreased.¶
We have some experience with benchmarking stateful NATxy gateways. When we tested iptables with very high number of connections, the 256GB RAM of the DUT was exhausted and it stopped responding. Such a situation may make the connection tracking table capacity measurements rather inconvenient. We include this possibility in our recommended measurement procedure, but we do not address the detection and elimination of such a situation. (E.g. how the algorithm can reset the DUT.)¶
For the connection tracking table size measurement, fist we need a safe number: C0. It is a precondition, that C0 number of connections can surely be stored in the connection tracking table of the DUT. Using C0, one can determine the maximum connection establishment rate using C0 number of connections. It is done with a binary search using validation. The result is: R0. The values C0 and R0 will serve as "safe" starting values for the following two searches.¶
First, we perform an exponential search to find the order of magnitude of the connection tracking table capacity. The search stops if the DUT collapses OR the maximum connection establishment rate severely drops (e.g. to its one tenth) due to doubling the number of connections.¶
Then, the result of the exponential search gives the order of magnitude of the size of the connection tracking table. Before disclosing the possible algorithms to determine the exact size of the connection tracking table, we consider a three possible replacement policies of the NATxy gateway:¶
Now, we examine, what happens and how many validation frames arrive in the there cases. Let the size of the connection tracking table be S, and the number of preliminary frames be N, where S is less than N.¶
If we know that the stateful NATxy gateway uses the first or second replacement policy, and we also know that both R and r rates are low enough, then the final step of determining the size of the connection tracking table is simple. If Responder sent N validation frames and the Initator received N' of them, then the size of the connection tracking table is N'.¶
In the general case, we perform a binary search to find the exact value of the connection tracking table capacity within E error. The search chooses the lower half of the interval if the DUT collapses OR the maximum connection establishment rate severely drops (e.g. to its half) otherwise it chooses the higher half. The search stops if the size of the interval is less than the E error.¶
The algorithms for the general case are defined using C like pseudocode in Figure 5. In practice, this algorithm may be made more efficient in a way that the binary search for the maximum connection establishment rate stops, if an elementary test fails at a rate under RS*beta or RS*gamma during the external search or during the final binary search for the capacity of the connection tracking table, respectively. (This saves a lot a execution time by eliminating the long lasting tests at low rates.)¶
As for writing policy of the state table of the Responder, we RECOMMEND round robin, because it ensures that its entries are automatically kept fresh and consistent with that of the connection tracking table of the DUT.¶
The Responder can read its state table in various orders, for example:¶
We RECOMMEND pseudorandom to follow the spirit of [RFC4814]. Round robin may be used as a computationally cheaper alternative.¶
As for scalability measurements, we do not define any new type of performance metrics, but we recommend to perform measurement series through which the value of one or more parameter(s) is/are changed to discover how the various values of the given parameter(s) influence the performance of the DUT.¶
The aim of the scalability measurements is to quantify how the performance of the stateful NATxy gateways degrades with the increase of the number of network flows.¶
As for the actual values for the number of network flows to be used during the measurement series, we RECOMMEND to use some representative values from the range of the potential number of network flows the DUT may be faced with during its intended usage.¶
It is important, how the given number of networks flows are generated. The sizes of the ranges of the source and destination IP addresses and port numbers are essential parameters to be reported together with the results. Please see also Section 6 about the reporting format.¶
If a single IP address pair is used, then we RECOMMEND to use¶
Stateful NATxy gateways are often implemented in software that are not bound to a specific hardware but can be executed by commodity servers. To facilitate the comparison of their performance, it can be useful to determine¶
If the number of the available CPU cores is a power of two, then we recommend to perform the tests with 1, 2, 4, 8, 16, etc. number of active CPU cores of the DUT.¶
Measurements MUST be executed multiple times to achieve statistically reliable results. The report of the results MUST contain the number of the repetitions of the measurements. We RECOMMEND median as the summarizing function of the results complemented with the first percentile and the 99th percentile as indices of the dispersion of the results. Average and standard deviation MAY also be reported.¶
All parameters and settings that may influence the performance of the DUT MUST be reported. Some of them may be specific to the given NATxy gateway implementation, like the "hashsize" (hash table size) and "nf_conntrack_max" (number of connection tracking table entries) values for iptables or the limit of the number of states for OpenBSD PF (set by the "set limit states number" command in the pf.conf file).¶
Figure 6 shows an example for table headings for reporting the measurement results for the scalability of the iptables stateful NAT44 implementation against the number of sessions. We have indicated the always required fields (req.) and the implementation specific ones (i.s.). In row 6, we also added a computed value, the number of sessions per hashsize ratio, what helps the reader to interpret the achieved maximum connection establishment rate. (A lower value results in shorter linked lists hanging on the entries of the hash table and thus facilitating higher performance. The ratio is varying, because the number of sessions is always a power of 10, whereas hash table size is a power of 2.) To reflect the accuracy of the results, we have also added the value of the "error" of the binary search, which expresses the stopping criterion for the binary search. The binary search stops, when the difference of the "higher limit" and "lower limit" of the binary search is less than or equal to "error".¶
The table MUST be complemented with reporting the relevant parameters of the DUT. If the DUT is a general purpose computer and some software NATxy gateway implementation is tested, then hardware description SHOULD include: computer type, CPU type and number of active CPU cores, memory type, size and speed, network interface card type (reflecting also the speed), the fact that direct cable connections were used or the type of the switch used for interconnecting the Tester and the DUT. Operating system type and version, kernel version, and the version of the NATxy gateway implementation (including last commit date and number if applicable) SHOULD also be given.¶
The "stateful" branch of siitperf [SIITPERF] is an implementation of this concept. It is documented in this (open access) paper [LEN2022].¶
The proposed benchmarking methodology has been validated by perfoming benchmarking measurements with three radically different stateful NAT64 implementations (Jool, tayga+iptables, OpenBSD PF) in (open access) paper [LEN2023].¶
Our further experience with this methodology using siitperf for measuring the scalability of the iptables stateful NAT44 and Jool stateful NAT64 implementations is described in [I-D.lencse-v6ops-transition-scalability].¶
Stateful NATxy solutions handle TCP and UDP differently, e.g. iptables uses 30s timeout for UDP and 60s timeout for TCP. Thus benchmarking results produced using UDP do not necessarily characterize the performance of a NATxy gateway well enough, when they are used for forwarding Internet traffic. As for the given example, timeout values of the DUT may be adjusted, but it requires extra consideration.¶
Other differences in handling UDP or TCP are also possible. Thus we recommend that further investigations are to be performed in this field.¶
As a mitigation of this problem, we recommend that testing with protocols usig TCP (like HTTP and HTTPS) can be performed as described in [I-D.ietf-bmwg-ngfw-performance]. This approach also solves the potential problem of protocol helpers may be present in the stateful DUT.¶
The authors would like to thank Al Morton, Sarah Banks, Edwin Cordeiro, Lukasz Bromirski, Sándor Répás, Tamás Hetényi, Timothy Winters, and Eduard Vasilenko for their comments.¶
This work was supported by the Japan Trust International Research Cooperation Program of the National Institute of Information and Communications Technology (NICT), Japan.¶
This document does not make any request to IANA.¶
We have no further security considerations beyond that of [RFC8219]. Perhaps they should be cited here so that they be applied not only for the benchmarking of IPv6 transition technologies, but also for the benchmarking of stateful NATxy gateways.¶
Initial version.¶
Updates based on the comments received on the BMWG mailing list and minor corrections.¶
Section 4.4 was completely re-written. As a consequence, the occurrences of the now undefined "mostly different" source port number destination port number combinations were deleted from Section 4.5, too.¶
Added Section 4.3 about the consideration of the cases of stateful operation.¶
Consistency checking. Removal of some parts obsoleted by the previous re-writing of Section 4.4.¶
Added Section 4.8 about the method for measuring connection tear down rate.¶
Updates for Section 7 about the implementation and experience.¶
Update of the abstract.¶
Added Section 4.6 about validation of connection establishment.¶
Added Section 4.9 about the method for measuring connection tracking table capacity.¶
Consistency checking and corrections.¶
Added measurement setup for Stateful NAT64 gateways.¶
Consistency checking and corrections.¶
Added Section 4.5.1 about typical types of measurement series and reporting format.¶
Added the usage of multiple IP addresses.¶
Section 4.5.1 was removed and split into two Sections: Section 5 about scalability measurements and Section 6 about reporting format.¶
Updated the usage of multiple IP addresses.¶
Test phases were renamed as follows:¶
Minor updates to Section 3.2 and Section 7.¶