Internet-Draft Inband Flow Learning Framework July 2023
Han, et al. Expires 28 January 2024 [Page]
Workgroup:
OPSAWG Working Group
Internet-Draft:
draft-hwy-opsawg-ifl-framework-04
Published:
Intended Status:
Informational
Expires:
Authors:
L. Han
China Mobile
M. Wang
China Mobile
X. Wang
Huawei
T. Zhou
Huawei

Inband Flow Learning Framework

Abstract

On-path telemetry techniques can provide high-precision inband flow insight and real-time network performance monitoring by embedding instructions or metadata into user packets. They are benificial but still has problems of deployability and flexibility in large scale deployment scenario. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow-oriented monitoring using on-path telemetry techniques, trying to provide a solution for reference to solve the problems. This document also provides different deployment approaches and considerations in practical network deployment.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 28 January 2024.

Table of Contents

1. Introduction

On-path telemetry techniques described in [I-D.song-opsawg-ifit-framework] such as IOAM [RFC9197] and Alternate-Marking [RFC9341] can provide high-precision inband flow insight and real-time network performance monitoring (e.g., jitter, latency, packet loss) by embedding instructions or metadata into user packets. They are benificial for network operation to monitor live traffic running in the network, based on inband flow information telemetry on the entire forwarding path.

However, when deploying flow-oriented monitoring using on-path telemetry techniques on live traffic, problems like changes of flow characteristics or paths may occur whitch make the traditional static configuration mode no longer applicable. [I-D.hwyh-ippm-ps-inband-flow-learning] states problems of flow identification applying on-path telemetry techniques in real network scenarios, and describes the requirements for inband flow learning mechanism whitch intends to address the problems of deployability and flexibility. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow-oriented monitoring using on-path telemetry techniques. This document also provides different deployment approaches and considerations in practical network deployment. Note that this document focuses on the generation of inband flow telemetry object, and inband flow performance measurement methods are out of the scope of this document.

2. Terminology and Conventions

2.1. Requirement Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2. Terminology

IFL: Inband Flow Learning

IFITI: Inband Flow Information Telemetry Instance

3. Framework of Inband Flow Learning

The domain of inband flow information telemetry consists of ingress nodes, transit nodes and egress nodes. The ingress nodes are responsible for enabling monitoring functions and the egress nodes are responsible for terminating them. All the nodes in the domain may participate in the inband flow learning by excecuting corresponding functions in the framework of Inband Flow Learning (IFL). The framework of IFL includes three components of Service Discovery, Inband Flow Information Telemetry Deployment and Inband Flow Information Telemetry Adjustment shown in Figure 1. Among these different components, inband flow learning can be embodied in automatic service discovery, automatic flow telemetry deployment, and automatic flow telemetry adjustment.

   +---------+-------------------+------------------+------------------+
   |Component|      Service      |   Inband Flow    |   Inband Flow    |
   |         |     Discovery     |   Information    |   Information    |
   |         |                   |    Telemetry     |    Telemetry     |
   |         |                   |    Deployment    |    Adjustment    |
   +---------+-------------------+------------------+------------------+
   |Functions|   Sampling polic  | Telemetry policy |                  |
   |         |-------------------+------------------+       Aging      |
   |         |Flow characteristic|Telemetry instance|                  |
   |         |    acquisition    |                  |                  |
   +---------+-------------------+------------------+------------------+

Figure 1 Framework of Inband Flow Learning

3.1. Service Discovery

Before starting the telemetry on service flows, the service should be discovered in order to further determine which flow should be monitored. The target of service discovery function is to obtain the flow characteristics, whitch are represented in terms of IP source address, IP destination address, TCP/UDP port number, VRF, incoming/outgoing interface etc.

Automatic service discovery is implemented based on the sampling policy delivered by the control plane and flow characteristic acquisition on the forwarding plane, whitch is usually performed on the ingress node. Sampling policy is a set of rules that instruct the forwarding plane to identify service flow characteristics based on a specific scope. Flow characteristic acquisition is a process in which the forwarding plane identifies, extracts, and reports service flow characteristic on the live traffic based on the sampling policy.

For example, if the service traffic to be monitored has a particular port number, to automatically discover all flows of the service identified by 5-tuple, a sampling policy can be configured to match the live traffic with the particular port number and generate flow information at the 5-tuple granularity. When live traffic passes through the ingress node, the forwarding plane can filters traffic based on the specified sampling policy, identifies all flows with the particular port number, and reports the flows with 5-tuple information. The automatically discovered service flow information can be stored distributedly on the ingress node, or reported to the newwork controller for centralized management.

3.2. Inband Flow Information Telemetry Deployment

After acquiring the flow characteristics by service discovery, telemetry based on the inband flow information can be deployed automatically. Automatic flow telemetry deployment is implemented by creating telemetry instances based on telemetry policy, and executed on different types of network nodes in the domain according to the telemetry mode.

3.2.1. Telemetry Mode

There are two modes to deploy inband flow information telemetry: End-to-End (E2E) and Hop-by-Hop (HbH). For majority of the services, E2E telemetry of service flows can meet the requirements of network operators by providing the entire performance insight of the service. In E2E mode shown in Figure 2, ingress node discovers the characteristics of service flows and proceed on-path telemetry on the flows to be monitored. Egress node need to deploy the same monitoring flows and complete the telemetry. If the telemetry data is not carried in the data packet but is reported at each node, flow identifier is required to associate the data on data consumer. Documents like [RFC9326] [RFC9343] [I-D.ietf-mpls-inband-pm-encapsulation] provide the encapsulation format of flow identifier.

                    +-------------+
                    |Data Consumer| compute E2E flow info
                    +-------------+
                       |        |
         ___flow info__|        |____flow info____
        |   telemetry                telemetry    |
        |                                         |
 +---------+   +---------+    +---------+   +---------+
 | Ingress |---| Transit | ...| Transit |---| Egress  |
 |   Node  |   |   Node  |    |   Node  |   |   Node  |
 +---------+   +---------+    +---------+   +---------+

Figure 2 End-to-End Telemetry Mode

The distinction of HbH mode to E2E mode is that transit node also participates the inband flow information learning and telemetry. In HbH mode shown in Figure 3, telemetry covers the flow information on every node of the forwarding path the flow packet is transmitted, which provides detailed flow information on each hop. Hop-by-Hop telemetry usually works in the need of an on-demand fault diagnose.

                  +-------------+
                  |Data Consumer| compute HbH flow info
                  +-------------+
                    |   |  |   |  flow info telemetry
      ______________|   |  |   |_________________
     |               ___|  |___                  |
     |              |          |                 |
 +---------+   +---------+    +---------+   +---------+
 | Ingress |---| Transit | ...| Transit |---| Egress  |
 |   Node  |   |   Node  |    |   Node  |   |   Node  |
 +---------+   +---------+    +---------+   +---------+

Figure 3 Hop-by-Hop Telemetry Mode

3.2.2. Telemetry Policy

Telemetry policy is used to determine which flow should be monitored. By configuring telemetry policy, it can increase the priority of learning and telemetry to critical flow and reduce or filter the learning and telemetry of unimportant flows. It is crucial to network deployment for two reasons, one is the number of flows can be huge, another is the limitation of processing capability either on the controller or the network node. There might be millions of flows in a large scale network, for example 5G mobile backhaul network. It is important to wisely choose the granularity of inband flow information telemetry.

Regarding IP traffics, the telemetry policy can be based on either one of or combination of flow characteristics, such as IP source/destination address, TCP/UDP port number, VRFs, or network device interfaces etc. An IP address with a flexible wildcard mask can also be used as means to provide telemetry policy to an aggregation of flows.

3.2.3. Telemetry Instance

Inband Flow Information Telemetry Instance(IFITI), in short called telemetry instance, is the management object of the monitored flow for the deployment of flow-oriented on-path telemetry techniques under the framework of IFL. During its life cycle, IFITI is responsible for providing performance telemetry data on the nodes that the flow it monitors traverses.

On ingress nodes IFITIs can be automatically generated in either distributed or centralized way by implementing telemetry policies for automatically discovered service flows. The transit nodes and egress nodes can also automatically generate IFITIs by learning some special information of the monitored flows whitch is embedded by the ingress nodes without configuring flow characteristics. Flow identifier is such special information whitch may be a unique value within a domain encapsulated in the service packets to setup the relationship between the characteristic information, telemetry instance and the service flow. It can not only correlate the telemetry data of flows on each node, as mentioned in the previous section, but also serve as the key marker for the forwarding plane to identify the monitored flow. For the forwarding plane, it is much easier to identify a piece of data in a service packet than to identify various types of flow characteristics.

The following uses flow identifier as an example to describe the flow learning process on transit and egress node. Once the telemetry instance is created, ingress node can start the telemetry of flow information based on the method of on-path telemetry techniques. At the same time, ingress node encodes inband monitoring information in the service packets, including the identifier. When a service flow packet passes through the transit node or egress node, if the node detects that the packet contains a flow identifier, it considers that the packet is a service flow packet to be monitored, and automatically creates a telemetry instance using the identifier as the key.

The automatic creation of telemetry instance on network node can greatly facilitate the dynamic and incremental deployment. On all types of nodes, network operators do not need to statically configure characteristics of monitored flows, which saves a lot of workload and reduces error probability in a large-scale deployment scenario. When the path of the monitored flow changes, the monitored flow can be automatically detected on the new path node and the corresponding telemetry instance can be automatically deployed.

4. Inband Flow Information Telemetry Adjustment

When route convergence happens to the network, service flow may switch to other forwarding nodes. When the traffic changes, telemetry instance varies as well. Regarding the telemetry instance running on the fault path, the aging of IFITI should be supported in order to recycle the network resources. IFITI should be deleted once it becomes stale. To monitor the same flow information, new telemetry instance is required to add on the new transit or egress node. Note that aging and adjustment of IFITI can be initiated by controller or network node. When a specific timer used for flow information telemetry timeout, the IFITI would be deleted to stop the telemetry of the flow.

5. IANA Considerations

This document has no request to IANA

6. Security Considerations

TBD

7. References

7.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.

7.2. Informative References

[I-D.hwyh-ippm-ps-inband-flow-learning]
Han, L., Wang, M., Wang, X., and J. Huang, "Problem Statement and Requirement for Inband Flow Learning", Work in Progress, Internet-Draft, draft-hwyh-ippm-ps-inband-flow-learning-03, , <https://datatracker.ietf.org/api/v1/doc/document/draft-hwyh-ippm-ps-inband-flow-learning/>.
[I-D.ietf-mpls-inband-pm-encapsulation]
Cheng, W., Min, X., Zhou, T., Dai, J., and Y. Peleg, "Encapsulation For MPLS Performance Measurement with Alternate Marking Method", Work in Progress, Internet-Draft, draft-ietf-mpls-inband-pm-encapsulation-06, , <https://datatracker.ietf.org/doc/html/draft-ietf-mpls-inband-pm-encapsulation-06>.
[I-D.song-opsawg-ifit-framework]
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "Framework for In-situ Flow Information Telemetry", Work in Progress, Internet-Draft, draft-song-opsawg-ifit-framework-20, , <https://datatracker.ietf.org/doc/html/draft-song-opsawg-ifit-framework-20>.
[RFC9197]
Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, Ed., "Data Fields for In Situ Operations, Administration, and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, , <https://www.rfc-editor.org/info/rfc9197>.
[RFC9326]
Song, H., Gafni, B., Brockners, F., Bhandari, S., and T. Mizrahi, "In Situ Operations, Administration, and Maintenance (IOAM) Direct Exporting", RFC 9326, DOI 10.17487/RFC9326, , <https://www.rfc-editor.org/info/rfc9326>.
[RFC9341]
Fioccola, G., Ed., Cociglio, M., Mirsky, G., Mizrahi, T., and T. Zhou, "Alternate-Marking Method", RFC 9341, DOI 10.17487/RFC9341, , <https://www.rfc-editor.org/info/rfc9341>.
[RFC9343]
Fioccola, G., Zhou, T., Cociglio, M., Qin, F., and R. Pang, "IPv6 Application of the Alternate-Marking Method", RFC 9343, DOI 10.17487/RFC9343, , <https://www.rfc-editor.org/info/rfc9343>.

Authors' Addresses

Liuyan Han
China Mobile
Beijing
China
Minxue Wang
China Mobile
Beijing
China
Xuanxuan Wang
Huawei
Nanjing
China
Tianran Zhou
Huawei
Beijing
China