Internet-Draft | Secondary-Label-applications | July 2023 |
Mohanty, et al. | Expires 11 January 2024 | [Page] |
This draft utilizes the concept of a secondary label to solve few cases in L3VPN Deployments.In BGP VPN networks, BGP speakers associate a local MPLS label when the next-hop is reset and advertise that label to other peers. The receiving peer installs this "received" label in the forwarding and forwards traffic to the sending router using this label. In some deployments, there arises need where a different label is required to be sent. We illustrate with two use-cases.¶
This draft presents a method where this label is encoded in a newly defined attribute that is advertised with the BGP updates targeting these specified use-cases¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 11 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶
In BGP L3VPN VPN networks, BGP speakers associate a local MPLS label with the next-hop is reset and advertise that label to other peers. The receiving peer installs this "received" label in the forwarding and directs traffic to the sending router using this label. This local label allocation is governed by the configured label allocation mode. Broadly, most vendors already offer different allocation modes like per-vhf, per-prefix, per-next-hop and per-nexthop-per-received-label.¶
In certain cases, the exclusive allocation of the local label is not sufficient. In this draft, we outline use-cases where the allocation of an additional label, hereby referred to as the secondary label, is necessary to be communicated to the BGP peer. Using this secondary label, the peer can impose forwarding decisions and solve some use-cases that are significantly non-trivial to achieve with the standard local-label allocation alone.¶
The standard behavior in case of option-B ASBR [RFC4364] is to allocate a per-prefix label for vpn prefixes. To conserve label space at the ASBR, many vendors implement a label allocation mode called per-nexthop-received- label. With per-nexthop-received-label, all prefixes received with the same next-hop and same received-label (both together constitute the label context) will be assigned the same local label. This approach conserves label space by avoiding the allocation is a unique label for each prefix. In case of Primary/backup, the context of the label allocation is the set of tuples {(Nexthop, recvd-label)} The above implementation (originally meant for the ASBR) also applies to RR with next-hop-self. In the below topology (representative of a tier 1 provider topology), RR1 and RR2 have the per-nexthop-received-label mode configuration and have next-hop-self towards each other. Both RRs receive the VPN prefix (RD 1:1: 2.2.2.2/32) from R1 with its connected address as the next-hop and advertise to the other RR on the cross-link after resetting the next-hop to self.¶
Although we will not explain here, a similar topology can be thought of in an dual Option-B deployment where the ASBRs will have each other as backup [RFC2119].¶
Figure 1 represents an all-IBGP network. PE1 is originating VPN routes and advertising them to RR1 and RR2. Both these RRs are also clients of each other and advertise VPN routes to each other with the next-hop set to the peering address. Each RR considers the path from PE1 as the best and the backup from the other RR (BGP PIC for VPNV4 and VPNV6 is configured). Label mode per-nexthop-received-label is configured.¶
The root cause of the label churn in is because the local label in RR1 (same for RR2) is an input to the label allocation context at RR2, and the resulting allocated local label at RR2 now serves as an input into the label allocation context at RR1. Because of this feedback loop the situation quickly results in the RRs getting out of label space very quickly.¶
Notice that if the RRs have the per-prefix label allocation mode configured, then this sort of oscillation will not happen. However, the per-prefix label allocation in an RR with next-hop-self configured will also mean a unique label for every unique prefix and that is not scalable.¶
ISP1 and ISP2 are CE devices that establish an EBGP session with PE1 and PE2 respectively. Both ISPs advertise the same 700k prefixes/routes to PE1 and PE2. Both PE1 and PE2 only send the default route to the remote PE, PE0.¶
Failure scenario 1 (FS#1) When ISP1-PE1 link goes down , after BGP at PE1 converges, traffic will traverse the link through PE2, and then reach ISP1.¶
Failure scenario 2 (FS#2) The links from ISP1 to PE1 and PE2 are down at the same time; traffic will go to ISP2 after the BGP convergence at PE1 and PE2¶
FS#1 is a classic example when BGP PIC is appropriate . It is no wonder therefore that the convergence is good. However, in case of FS#2, with BGP PIC in place, this is what happens:¶
The conclusion is BGP PIC by itself is not adequate to handle these kinds of convergence issues arising from double link-failures.¶
To solve the issue of Section 4.1 the concept of a secondary label is introduced. At both the RRs, in addition to the local label another label, hereby referred to as the secondary label, is also allocated. This secondary label depends on the primary path exclusively i.e. only the path from PE1 and not on the path from RR2. This secondary label is encoded inside an attribute called the secondary label attribute that is advertised along with the the BGP best-path advertisement to RR2 and PE2. The format of the secondary label attribute is described in Section 6. Similar concept is described in [I-D.kaliraj-idr-multinexthop-attribute] but the next-hop and other fields etc. is not included in the secondary label attribute.¶
When RR2 receives the update from RR1 that it selects as its backup path and finds the secondary label attribute, it will only consider the label encoded in the secondary label attribute and ignore the received label in its local-label allocation decision. It will also program the label encoded in the secondary label attribute instead of the received label in the forwarding imposition. As the secondary label only depends on the primary path from PE1, it is unaffected by the advertisement from the other RR, and the continuous label churn is arrested immediately.¶
Without loss of generality, considering PE2 as the DUT, the main reason about the inability of BGP PIC (as described above) to help in this case is that the status of the primary link on the peer PE, PE1, the PE1-ISP1 link, is unknown to PE2 until it receives the corresponding BGP withdraw. Following is the main underlying idea of our proposed solution.¶
A new Optional Transitive Attribute will be created for carrying the secondary label. This attribute will be referred as the secondary label attribute. The format is as specified below.¶
The Secondary label attribute contains a flags field (1-byte) and a set of Type (1 byte) and Label (3 bytes). The flag bits will be specified in the future. The label type will denote the context, for PS1#, the type is 0, for PS#2, the type is 1. As we find more and more use-cases, types will be assigned appropriately.¶
We will request IANA assignment for the secondary label attribute¶
We have described two use-cases where the concept of a second label greatly helps in optimizing network resources and improve convergence at the potential cost of increasing the label allocation resources. However, the advantages of the solutions with the secondary label are the simplicity, the optimization and convergence improvements that it provides to the network. There can be many potential use-cases for this secondary label concept.¶
We will request IANA assignment for the secondary label attribute¶
Request IANA assignment for the secondary label attribute with code-type 71¶
This document raises no new security issues for RT Constraints.¶
TBD.¶