Internet-Draft | Inband Flow Learning Framework | July 2023 |
Han, et al. | Expires 28 January 2024 | [Page] |
On-path telemetry techniques can provide high-precision inband flow insight and real-time network performance monitoring by embedding instructions or metadata into user packets. They are benificial but still has problems of deployability and flexibility in large scale deployment scenario. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow-oriented monitoring using on-path telemetry techniques, trying to provide a solution for reference to solve the problems. This document also provides different deployment approaches and considerations in practical network deployment.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 28 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
On-path telemetry techniques described in [I-D.song-opsawg-ifit-framework] such as IOAM [RFC9197] and Alternate-Marking [RFC9341] can provide high-precision inband flow insight and real-time network performance monitoring (e.g., jitter, latency, packet loss) by embedding instructions or metadata into user packets. They are benificial for network operation to monitor live traffic running in the network, based on inband flow information telemetry on the entire forwarding path.¶
However, when deploying flow-oriented monitoring using on-path telemetry techniques on live traffic, problems like changes of flow characteristics or paths may occur whitch make the traditional static configuration mode no longer applicable. [I-D.hwyh-ippm-ps-inband-flow-learning] states problems of flow identification applying on-path telemetry techniques in real network scenarios, and describes the requirements for inband flow learning mechanism whitch intends to address the problems of deployability and flexibility. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow-oriented monitoring using on-path telemetry techniques. This document also provides different deployment approaches and considerations in practical network deployment. Note that this document focuses on the generation of inband flow telemetry object, and inband flow performance measurement methods are out of the scope of this document.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
IFL: Inband Flow Learning¶
IFITI: Inband Flow Information Telemetry Instance¶
The domain of inband flow information telemetry consists of ingress nodes, transit nodes and egress nodes. The ingress nodes are responsible for enabling monitoring functions and the egress nodes are responsible for terminating them. All the nodes in the domain may participate in the inband flow learning by excecuting corresponding functions in the framework of Inband Flow Learning (IFL). The framework of IFL includes three components of Service Discovery, Inband Flow Information Telemetry Deployment and Inband Flow Information Telemetry Adjustment shown in Figure 1. Among these different components, inband flow learning can be embodied in automatic service discovery, automatic flow telemetry deployment, and automatic flow telemetry adjustment.¶
+---------+-------------------+------------------+------------------+ |Component| Service | Inband Flow | Inband Flow | | | Discovery | Information | Information | | | | Telemetry | Telemetry | | | | Deployment | Adjustment | +---------+-------------------+------------------+------------------+ |Functions| Sampling polic | Telemetry policy | | | |-------------------+------------------+ Aging | | |Flow characteristic|Telemetry instance| | | | acquisition | | | +---------+-------------------+------------------+------------------+¶
Figure 1 Framework of Inband Flow Learning¶
Before starting the telemetry on service flows, the service should be discovered in order to further determine which flow should be monitored. The target of service discovery function is to obtain the flow characteristics, whitch are represented in terms of IP source address, IP destination address, TCP/UDP port number, VRF, incoming/outgoing interface etc.¶
Automatic service discovery is implemented based on the sampling policy delivered by the control plane and flow characteristic acquisition on the forwarding plane, whitch is usually performed on the ingress node. Sampling policy is a set of rules that instruct the forwarding plane to identify service flow characteristics based on a specific scope. Flow characteristic acquisition is a process in which the forwarding plane identifies, extracts, and reports service flow characteristic on the live traffic based on the sampling policy.¶
For example, if the service traffic to be monitored has a particular port number, to automatically discover all flows of the service identified by 5-tuple, a sampling policy can be configured to match the live traffic with the particular port number and generate flow information at the 5-tuple granularity. When live traffic passes through the ingress node, the forwarding plane can filters traffic based on the specified sampling policy, identifies all flows with the particular port number, and reports the flows with 5-tuple information. The automatically discovered service flow information can be stored distributedly on the ingress node, or reported to the newwork controller for centralized management.¶
After acquiring the flow characteristics by service discovery, telemetry based on the inband flow information can be deployed automatically. Automatic flow telemetry deployment is implemented by creating telemetry instances based on telemetry policy, and executed on different types of network nodes in the domain according to the telemetry mode.¶
There are two modes to deploy inband flow information telemetry: End-to-End (E2E) and Hop-by-Hop (HbH). For majority of the services, E2E telemetry of service flows can meet the requirements of network operators by providing the entire performance insight of the service. In E2E mode shown in Figure 2, ingress node discovers the characteristics of service flows and proceed on-path telemetry on the flows to be monitored. Egress node need to deploy the same monitoring flows and complete the telemetry. If the telemetry data is not carried in the data packet but is reported at each node, flow identifier is required to associate the data on data consumer. Documents like [RFC9326] [RFC9343] [I-D.ietf-mpls-inband-pm-encapsulation] provide the encapsulation format of flow identifier.¶
+-------------+ |Data Consumer| compute E2E flow info +-------------+ | | ___flow info__| |____flow info____ | telemetry telemetry | | | +---------+ +---------+ +---------+ +---------+ | Ingress |---| Transit | ...| Transit |---| Egress | | Node | | Node | | Node | | Node | +---------+ +---------+ +---------+ +---------+¶
Figure 2 End-to-End Telemetry Mode¶
The distinction of HbH mode to E2E mode is that transit node also participates the inband flow information learning and telemetry. In HbH mode shown in Figure 3, telemetry covers the flow information on every node of the forwarding path the flow packet is transmitted, which provides detailed flow information on each hop. Hop-by-Hop telemetry usually works in the need of an on-demand fault diagnose.¶
+-------------+ |Data Consumer| compute HbH flow info +-------------+ | | | | flow info telemetry ______________| | | |_________________ | ___| |___ | | | | | +---------+ +---------+ +---------+ +---------+ | Ingress |---| Transit | ...| Transit |---| Egress | | Node | | Node | | Node | | Node | +---------+ +---------+ +---------+ +---------+¶
Figure 3 Hop-by-Hop Telemetry Mode¶
Telemetry policy is used to determine which flow should be monitored. By configuring telemetry policy, it can increase the priority of learning and telemetry to critical flow and reduce or filter the learning and telemetry of unimportant flows. It is crucial to network deployment for two reasons, one is the number of flows can be huge, another is the limitation of processing capability either on the controller or the network node. There might be millions of flows in a large scale network, for example 5G mobile backhaul network. It is important to wisely choose the granularity of inband flow information telemetry.¶
Regarding IP traffics, the telemetry policy can be based on either one of or combination of flow characteristics, such as IP source/destination address, TCP/UDP port number, VRFs, or network device interfaces etc. An IP address with a flexible wildcard mask can also be used as means to provide telemetry policy to an aggregation of flows.¶
Inband Flow Information Telemetry Instance(IFITI), in short called telemetry instance, is the management object of the monitored flow for the deployment of flow-oriented on-path telemetry techniques under the framework of IFL. During its life cycle, IFITI is responsible for providing performance telemetry data on the nodes that the flow it monitors traverses.¶
On ingress nodes IFITIs can be automatically generated in either distributed or centralized way by implementing telemetry policies for automatically discovered service flows. The transit nodes and egress nodes can also automatically generate IFITIs by learning some special information of the monitored flows whitch is embedded by the ingress nodes without configuring flow characteristics. Flow identifier is such special information whitch may be a unique value within a domain encapsulated in the service packets to setup the relationship between the characteristic information, telemetry instance and the service flow. It can not only correlate the telemetry data of flows on each node, as mentioned in the previous section, but also serve as the key marker for the forwarding plane to identify the monitored flow. For the forwarding plane, it is much easier to identify a piece of data in a service packet than to identify various types of flow characteristics.¶
The following uses flow identifier as an example to describe the flow learning process on transit and egress node. Once the telemetry instance is created, ingress node can start the telemetry of flow information based on the method of on-path telemetry techniques. At the same time, ingress node encodes inband monitoring information in the service packets, including the identifier. When a service flow packet passes through the transit node or egress node, if the node detects that the packet contains a flow identifier, it considers that the packet is a service flow packet to be monitored, and automatically creates a telemetry instance using the identifier as the key.¶
The automatic creation of telemetry instance on network node can greatly facilitate the dynamic and incremental deployment. On all types of nodes, network operators do not need to statically configure characteristics of monitored flows, which saves a lot of workload and reduces error probability in a large-scale deployment scenario. When the path of the monitored flow changes, the monitored flow can be automatically detected on the new path node and the corresponding telemetry instance can be automatically deployed.¶
When route convergence happens to the network, service flow may switch to other forwarding nodes. When the traffic changes, telemetry instance varies as well. Regarding the telemetry instance running on the fault path, the aging of IFITI should be supported in order to recycle the network resources. IFITI should be deleted once it becomes stale. To monitor the same flow information, new telemetry instance is required to add on the new transit or egress node. Note that aging and adjustment of IFITI can be initiated by controller or network node. When a specific timer used for flow information telemetry timeout, the IFITI would be deleted to stop the telemetry of the flow.¶
This document has no request to IANA¶
TBD¶