Internet-Draft | ROSA | July 2023 |
Mendes, et al. | Expires 10 January 2024 | [Page] |
The proliferation of virtualization, microservices, and serverless architectures has made the deployment of services possible in more than one network location, alongside long practised replication within single network locations, such as within a CDN datacentre. This necessitates the potential need to coordinate the steering of (client-initiated) traffic towards different services and their deployed instances across the network.¶
The term 'service-based routing' (SBR) captures the set of mechanisms for said traffic steering, positioned as an anycast problem, in that it requires the selection of one of the possibly many choices for service execution at the very start of a service transaction, followed by the transfer of packets to that chosen service endpoint.¶
This document provides typical scenarios for service-based routing, particularly for which a more dynamic and efficient (in terms of both latency and signalling overhead) selection of suitable service execution endpoints would not exhibit the overheads and thus latency penalties experienced with existing explicit discovery methods. Related drafts introduce the design for an in-band service discovery method instead, named Routing on Service Addresses (ROSA), based on the insights from the use case and problem discussion in this draft.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 10 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Service provisioning in recent years has been largely driven by two trends. Firstly, virtualization has enabled service provisioning in more than one network location, progressing from virtual machines to containers, thus enabling sub-second service execution availability. Secondly, the cloud-native paradigm postulates agile development and integration of code, decomposing applications into smaller micro-services, to be deployed and scaled independently, yet chained towards a larger common objective. Micro-service deployment may be done following a serverless model where a third-party provider allocates resources to the micro-services in different network locations when an application is triggered and re-assigning them elsewhere when the application is no longer active. Such deployment flexibility allows to bring services 'closer' to consumers, but also poses challenges such as the need for a service discovery and selection process that aligns with the needed dynamicity in selecting suitable service endpoints, with a particular emphasis on minimizing the latency from the initiating client request to the actual service response.¶
Service-level communication, captured through the term 'service-based routing' (SBR) throughout this document, has been realized with a decades-old DNS-based model to map service domains onto one of a set of IP addresses, often based on load or geo-information. Those IP addresses and port assignments identify network interfaces and sockets for service access. Contrasting against the aforementioned trends of evolved resource availability, deployment flexibility and location independence, those assignments typically remain static.¶
We recognize that the Internet community has developed solutions to cope with the limitations of the DNS+IP model, such as Global Server Load Balancing (GSLB) [GSLB], DNS over HTTPS [RFC8484], HTTP indirection [RFC7231] or, more recently, at transport level through QUIC-LB [I-D.ietf-quic-load-balancers]. At the routing level, [TIES2021] outlines a solution to map URL-based services onto a small set of IP addresses, utilizing virtual hosting techniques at the incoming Point-Of-Presence (PoP) to suitably distribute the request to the computational resource that may serve it. However, such solutions compound the centrality of service provisioning through Content Delivery Networks (CDNs).¶
This centralization of Internet services has been well observed, not just in IETF discussions [Huston2021] [I-D.nottingham-avoiding-internet-centralization], but also in other efforts that aim to quantify the centralization, using methods such as the Herfindahl-Hirschman Index [HHI] or the Gini coefficient [Gini]. Dashboards of the Internet Society [ISOC2022] confirm the dominant role of CDNs in service delivery beyond just streaming services, both in centralization as well as resulting market inequality, which has been compounded through the global CV19 pandemic [CV19].¶
While we recognize that many of the existing Internet services are well served with existing solutions, it is our key observation in this draft is that those existing solutions and overall developments equally create pain points for use cases, where the dynamic selection among the set of possible choices is a key requirement, together with the need to reduce service completion time, and thus minimize latencies for explicit resolution steps, while possibly improve resource utilization across all deployed service endpoints.¶
In the remainder of this document, we first introduce a terminology in Section 2 that provides the common language used throughout this document and all related drafts. We then follow with the use cases in Section 3, each one structured along a description of the experienced service functionality and the aforementioned pain paints that may arise when utilizing existing service discovery and selection capabilities. We then summarize those pain points in Section 4, finally leading us to the formulation of a problem statement for service-based routing in Section 5.¶
The following terminology is used throughout the remainder of this document, as well as all the related drafts:¶
In the following, we outline examples of use cases that exhibit a degree of service distribution in which a service management scheme through explicit mapping and/or gatewaying may become complex and a possible hindrance for service performance. The following sections illustrate several examples, which complement other work, such as the BBF Metro Compute Networking (MCN) [MCN], which have developed similar but also additional use cases.¶
Video streaming has been revealed nowadays as the main contributing service to the traffic observed in operators' networks. Multiple stakeholders, including operators and third party content providers, have been deploying Content Distribution Networks (CDNs), formed by a number of cache nodes spread across the network with the purpose of serving certain regions or coverage areas with a proper quality level. In such a deployment, protection schemas are defined in order to ensure the service continuity even in the case of outages or starvation in cache nodes.¶
In addition to that, novel schemes of CDN interconnection [RFC6770] [SVA] are being defined allowing a given CDN to leverage the installed base of another CDN to complement its overall footprint.¶
As result, several caches are deployed in different PoPs in the network. This means that for a given content requested by an end user, several of those caches could be candidate nodes for data delivery. From a service perspective (a service being defined either at the level of a video service, expressed as a service domain name or at the level of individual content streams), specific caches represent service instances, i.e., possible candidates to serve the content and thus realize the desired service.¶
Currently, the choice of the cache node to serve the customer relies solely on the content provider logic, considering only a limited set of conditions to apply. For instance, the usage of cache-control [RFC7234] allows data origins to indicate caching rules downstream. For instance, the Targeted Cache Control (TCC) [RFC9213] defines a convention for HTTP response header fields that allows cache directives to be targeted at specific caches or classes of caches. The original intent was quite limited: to operate between the data source and the data consumer (browser).¶
We can observe the following pain points when realizing such scenario in today's available systems:¶
5G networks natively facilitate the decoupling of control and user plane. The 5G User Plane Function (UPF) connects the actual data coming over the Radio Area Network (RAN) to the Internet. Being able to quickly and accurately route packets to the correct destination on the internet is key to improving efficiency and user satisfaction. For this, the UPF terminates the tunnel carrying end user traffic over the RAN permitting to route such traffic in the 5G network towards its destination, e.g., providing reachability to edge computing facilities.¶
Currently, the UPF is planned to be deployed in two parts of the (5G) cellular system, namely in the Core Network and at the Edge inside a Multi-Access Edge Controller (MEC). However, in a future 6G network, it is envisioned that several UPFs can be deployed in a more distributed manner, not only for covering different access areas, but also with the attempt of providing access to different types of services, linked with the idea of network slicing as means for tailored service differentiation, while also allowing for frontloading services to minimize latency.¶
For instance, some UPFs could be deployed very close to the access for services requiring either low latency or very high bandwidth, while others, requiring less service flows, could be deployed in a more centralized manner. Furthermore, multiple service instances could be deployed in different UPFs albeit scaled up and down differently, depending on the demand in a specific moment at the specific UPF (and its serving area).¶
Similarly to mobile access networks, fixed access solutions are proposing schemas for the separation of control and user plane for Broadband Network Gateway (BNG) elements [I-D.wadhwa-rtgwg-bng-cups] [BBF]. From the deployment point of view, different instances can be deployed based on different metrics such as coverage, and temporary demand.¶
As a complement to both mobile and fixed access scenarios, edge computing capabilities are expected to complement the deployments for hosting service and applications of different purposes, both for services internal to the operator as well as third party services.¶
We can observe the following pain points when realizing such scenario based on today's available solutions:¶
Corporate services usually define requirements in terms of availability and resiliency. This is why multi-homing is common in order to diversify the access to services external to the premises of the corporation, or for providing interconnectivity of corporate sites (and access to internal services such as databases, etc).¶
A similar scenario in which external services need to be reached from within a specific location, is the Connected Aircraft. Solutions that allow for the exploitation of multi-connected aircrafts (e.g., several satellite connections, plus air-to-ground connectivity) are important to improve passenger experience, while helping make the crew more productive with networking solutions that enable seamless, high-speed broadband. Managing a multi-connected Aircraft would benefit from mechanisms that would enable the selection of the best connection points based on service-specific semantics, besides the traffic related parameters considered by solutions such as SD-WAN, which aims to automate traffic steering in an application-driven manner, based on the equivalent of a VPN service between well defined points.¶
Multi-homing issues in connection with aircrafts also extend to Unmanned Aircraft Systems (UAS). Rather than focusing on passenger experience, multi-homing over commercial off-the-shelf (COTS) communications modules such as 5G or IEEE 802.11 provide command, control and communications (C3) capabilities to Unmanned Aerial Vehicles (UAV; drones). Here, regulatory frameworks mandate fail-over and minimum response times that require active management of connectivity to the aircraft.¶
An architectural approach common to the Connected Aircraft as well as UAS is to view network functions physically located on the aircraft as services, which are multi-homed due to the communications fail-over capabilities of the aircraft. Additionally, objects in flight will regularly change network attachment points for the same physical link, which may require updates to service routing information.¶
The diversity of providers implies to consider service situations in a multi-domain environment, because of the interaction with multiple administrative domains.¶
From the service perspective, it seems necessary to ensure a common understanding of the service expectations and objectives independently of the domain traversed or the domain providing such a service. Common semantics can facilitate the assurance of the service delivery and a quick adaptation to changing conditions in the internal of a domain, or even across different domains.¶
The pain points for multi-homed and multi-domain services are:¶
Mobile applications usually install a monolithic implementation of the device-specific functionality, where this functionality may explicitly utilize remote service capabilities, e.g., provided through cloud-based services.¶
Application functionality may also be developed based on a micro-service architecture, breaking down the application into independent functions (services) that can work and communicate together. When such services are jointly deployed (i.e., installed) at the mobile device, its overall functionality resembles that of existing applications.¶
However, the services may also be invoked on network devices other than the mobile device itself, utilizing service-based routing capabilities to forward the service request (and its response) to the remote entity, effectively implementing an 'off-loading' capability. Efforts such as the BBF MCN work [MCN] capture this aspect as 'edge-to-edge collaboration', where in our case here the edge does include the end user devices themselves.¶
A distributed system developed based on a micro-service architecture inevitably introduces additional complexity as multiple independent services need to be synchronized in a way that allows them to work as a unified software system. If services are split across servers that multi-faceted infrastructure will need to be provisioned not just in resource allocation but also in its steering of traffic across those resources. This is where a service-centric network solution able to coordinate the chain of such services could play an important role.¶
The work in [I-D.sarathchandra-coin-appcentres] proposes such micro-service approach for mobile applications. The simple example in [I-D.sarathchandra-coin-appcentres] outlines the distribution of video reception, processing, and displaying capabilities as individual services across many network locations. As a result, display service instances may be switched very quickly based on, e.g., gaze control mechanisms, providing display indirection capabilities that utilize display hardware other than the original device's one, while image processing may be offloaded to one or more processing service instances; given the possible stateless nature of the processing, each individual video frame may be processed by another processing service instance to improve overall latency variance, as shown in [OnOff2022].¶
As also discussed in [I-D.sarathchandra-coin-appcentres], such micro-service design may well be integrated into today's application development frameworks, where a device-internal service registry would allow for utilizing device-local service instances first before directing the service invocation to the network, the latter relying on a service-based routing capability to steer the request to a 'suitable' service endpoint.¶
We can observe the following pain points when realizing such scenarios based on explicit discovery mechanisms:¶
Chunk-based video delivery is often constrained to, e.g., latency or playout requirements, while the content itself may be distributed as well as replicated across several network locations. Thus, it is required to steer client requests for specific content under specific constraints to one of the possibly many network locations at which the respective content may reside.¶
The work in [I-D.jennings-moq-quicr-arch] proposes a publish-subscribe metaphor that connects clients to a fixed infrastructure of relays for delivering the desired content under specific constraints. Within our context of service-based routing, the relays realize the selection of the 'right' service instance, deployed by different content providers, where this selection is being constrained by the requirements for the video's delivery to the client. However, the publish/subscribe operations in [I-D.jennings-moq-quicr-arch] manifest an explicit discovery step, plus require the deployment of an explicit relay overlay across possibly many network provider domains.¶
We can observe the following pain points when realizing such scenario through explicit overlays such as those proposed by QUICr:¶
AR/VR scenarios often utilize stored content for delivering immersive experiences, albeit with interaction capabilities stemming from the nature of the used equipment, e.g., headsets. This interaction may lead to varying content retrieval patterns, e.g., due to early termination of an ongoing content retrieval caused by a user moving the headset and thus changing the field of view.¶
In addition, AR/VR underlies stringent latency requirements. Among others, [I-D.liu-can-ps-usecases] outlines typical delay budgets for such scenarios. Thus, minimizing latencies for the overall delivery for each chunk is desirable.¶
Furthermore, the delivery of content to a group of clients often uses replicated storage, i.e., clients may be served from one of possibly many replicated content storages throughout the network. Given the stateless nature of content chunk retrieval in such replicated setup, it may be desirable to make decisions of where to send a client request at EVERY chunk request per client.¶
Expressed in notations of a queuing system, a system of N clients is suggested to be retrieving content chunks from k service instances, where each chunk request is directed to any of the possible k instances; given the stateless nature of this service, any of the k instances is able to serve the chunk without knowledge of any previous one.¶
Current systems usually employ a load balancing system, which determines which content storage to use at the beginning of a session as part of the DNS lookup for the video server, using techniques such as Global Server Load Balancing (GSLB [GSLB]). In the notation of a queuing system, only one server exists but serving N/k clients, if there are k replicas and N clients overall.¶
We can observe the following pain points when realizing such scenario in today's available systems that utilize per-session load balancing solution:¶
The computing continuum is a crucial enabler of 5G and 6G networks as it supports the requirements of new applications, such as latency and bandwidth critical ones, using the available infrastructure. With the advent of new networks deployed beyond the edge, such as vehicular and satellite networks, researchers have begun investigating solutions to support the cloud-to-thing continuum, in which applications distribute logic (services) across the network following a micro-service architecture. In this scenario storage, computing and networking resources are managed in a decentralized way between cloud, the edge (most liked MEC) and the adhoc network of moving devices, such as aircraft and satellites.¶
In this scenario, a serverless-based service architecture may be beneficial for the deployment and management of interdependent distributed computing functions, whose behavior and location can be redefined in real-time in order to ensure the continuous operation of the application. Serverless architecture is closely related to micro-services. The latter is a way to design an application and the former a way to run all or part of an application. That is the key to their compatibility. It is possible to code a micro-service and run it as a serverless function.¶
The combination of a microservice architecture with a serverless model is a driver for dynamicity in Cloud-to-Thing scenarios where a third-party cloud provider takes care of the deployment od all services encompassing each application. In this situation as soon as the application code is triggered, the server allocates resources to all its services in different locations and draws them back when the application is no longer active.¶
The consideration of serverless architectures is important for the Cloud-to-Thing continuum, since resources beyond the edge, in the adhoc part of the continuum, may be constraint and intermittently available. Hence it makes sense to leverage a serverless architecture in which applications consists of a set of functions (services) that are not permanently available. On contrary, services have a lifecycle as they are triggered, called, executed, runs and is then removed as soon as it is no longer needed. Serverless services only run when they are needed, potentially saving significant resources.¶
In this scenario, the combination of a service oriented data plan with a model capable of delegating and adapting serverless services in a Cloud-to-Thing continuum is important. The former need to be aware of the presence of different services/functions in order to be able to execute applications based on the correct selection and invocation of different services, within their lifetime. Most importantly, this awareness of the servies is likely to be highly dynamic in the nature of its distribution across network-connected nodes.¶
We can observe the following pain points when realizing such scenario in today's available systems based on explicit mapping and/or gatewaying:¶
Large-scale interactive and networked real-time rendered tree dimension Extended Reality (XR) spaces, such as the Metaverse, follow the assumption that applications will be hosted on platforms, similarly to current web and social media applications. However, the Metaverse is supposed to be more than the participation in isolated three dimension XR spaces. The Metaverse is supposed to allow the internetworking among a large number of XR spaces, although some problems have been observed such as lock-in effects, centralization, and cost overheads.¶
In spite of the general understanding about potential internetworking limitations, current technical discussions are ignoring the networking challenges altogether. From a networking perspective, it is expected that the Metaverse will challenge traditional client-server inspired web models, centralized security trust anchors and server-style distributed computing, due to the need to take into account interoperability among a large number of XR spaces, low latency and the envisioned Metaverse pervasiveness.¶
Current Metaverse platforms rely on web protocols and cloud services, but suffer from performance limitations when interconnecting XR spaces. Some of the challenges pass by consistent throughput to handle high resolution XR applications, and fast response times to computational requests. This leads to the need to bring cloud computing and storage resources towards the edge to reduce long round trip times.¶
To support Metaverse low latency requirements taking into account the constrained resource of heterogeneous devices in the Cloud-to-Thing continuum, a service-centric networking framework should be based on micro-services executed as serverless services/functions inside selected devices. The motivation to look at serverless functions is related to their capability to simplify service management on heterogeneous devices.¶
In this context, an open and decentralized Metaverse, able to allow the internetworking of a large number of XR spaces, may be supported by intertwining distributed computing and networking. Hence it is expected that Metaverse applications may gain from a service-centric network framework able to support the execution of services while taking advantage of storage, networking, and computing resources located as close as possible from users, with a dynamic assignment of client requests to those resources.¶
While the usage of isolated XR spaces is currently a reality, the deployment of a large scale Metaverse should relies in a decentralized networking framework, of which Distributed Ledger Technology (DLT) is a major driver, facilitating the deployment of several Metaverse features such as streaming of payments and NFTs that make digital ownership possible in the Metaverse. Moreover, DLT makes it possible to identify oneself in a secure way in the Metaverse, being also a major web3.0 building block. The Web3.0 builds Internet services on decentralized platforms, being the ownership of the platform tokenized and the users' own tokens are calculated based on their contribution to the platform. For instance Web3.0 domain names are DLT-based DNS addresses that allow users to create and manage their own personalized domains.¶
Development of DLT based on a service-centric networking approach brings several benefits. To start with, designing DLT applications as microservices allow many software engineering initiatives to run in parallel, reduce dependencies between software development, and allow for the support of multiple technologies, languages and frameworks. Moreover developing DLT on a service-centric networking framework may help to solve the DLT scalability problem, allowing the implementation of data sharding techniques, in which the storage of the ledger and/or the data used to recreate the ledger is divided across many shards, which are distirbuted between different devices. This process reduces individual nodes storage requirements at any given time to that of a single shard or small set of shards. a service-centric networking approach may also support the need for data availability sampling, providing a method for the network to check that data is available without putting too much strain on any individual node. This will allow DLT to prove that historical data needed to reconstruct part of the ledger was available at one point (i.e. when the block was produced) without nodes actually having to download all the data themselves.¶
We can observe the following pain points when realizing such scenario in today's available systems based on explicit mapping and/or gatewaying:¶
The BBF MCN use case report [MCN] outlines 'popularity' as a criteria to move from current explicit indirection-based approaches (such as DNS, GSLB, or Alto) to active service-based routing approaches.¶
Here, popularity, e.g., measured in service usage over a period of time, is being used as a trigger to announce a popular service to an active service-based routing platform, while less popular services continue to be served via existing (e.g., DNS-based) methods. Equally, services may be unannounced, thus retracted, from the service-based routing overlay to better control the overall cost for the provisioning of the service-based routing overlay.¶
With this, one could foresee the provisioning of a service-based routing overlay, such as ROSA, as an optimization for a CDN platform provider, either through commercially interfacing to a separate ROSA provider or providing the ROSA domain itself.¶
We can observe the following pain points when realizing such scenario in today's available systems based on explicit mapping and/or gatewaying:¶
Data access of any kind, be it for personal as well as curated content or for social media, has become essential to our lives, yet its implementation is fraught with problems. Content as well as service hosts are forced to use CDNs to effectively distribute their data, or choose to rely on one of the big platforms entirely. As a result, the transport from host to receiver is overseen by a conglomerate of giant multi-national corporations, as also observed in various Internet metrics like the GINI of HHI metric. For an end user, data governance but also realization of the significant (often cloud) infrastructure of those corporations are thus difficult to oversee as a result.¶
As a result, this mode of organizing data transport has created structural inefficiencies in our service provisioning infrastructure, e.g., for those distributed end user created video content. In contrast, a public video streaming infrastructure, which takes content from various hosts and distributes it in an efficient fashion without involvement of a centralized entity, may be preferable from a data governance and ownership standpoint, while still wanting to maintain the desired service quality. Yet, dominant video streaming providers are not incentivized to develop such technologies, since it reduces the barrier of entry for competitors. Instead, if necessary technologies were developed, big economic blocks like the EU could commission the creation of such an infrastructure on their territory even incentivize its use to foster decentralization and localized data governance. Such an undertaking could both possibly reduce the resource footprint for service provisioning as well as open the heavily concentrated market of service provisioning platforms.¶
We envision, for instance for accessing a video, that a user would access a service address, which in turn would be resolved to a regional service instance. This instance would either use local caches or connect to the wider video streaming infrastructure to retrieve the requested video in the most efficient manner. Within the video streaming infrastructure, techniques such as proximal caching or multicasting could be used to minimize resource usage.¶
Key here is not the ability to build such service provisioning infrastructure per se, but link the resolution of the service address to an IP address to a service category specific resolution overlay that is not just reducing the latencies experienced in today's DNS systems but allows for being deployed entirely independent from large corporations but instead from decentralized communities, such as for instance the 'fediverse'.¶
We can observe the following pain points when realizing such scenario in today's available POP-based systems:¶
Web browsing remains an important usage of the Internet, including during mobile use. Whether it is browsing through pages of places, e.g., linked through mapping services, or view the results of a search performed before, users often view and thus access pages on the Internet through the HTTP protocol suite. This is unlike, e.g., social media or over-the-top video services, which often underlie strict traffic engineering to ensure a superior user experience and are mainly accessed through dedicated, e.g., mobile, applications. However, for web browsing as outlined here, content delivery networks (CDNs) may be used for frequently visited websites, utilizing CDNs as large web caches to improve page loading times.¶
Key to the browsing experience is that webpages include links, often to other sites, for additional content. For instance, in 2019, a web page loaded on a desktop included on average 70 resources (75 for as mobile page) [MACHMETRIC], many of which may require their own DNS resolution if pointing to other URLs than those previously resolved (within the browsed page or in other pages visited before). Further, according to [MACHMETRIC], the time to first bye (TTFB) was 1.28s for a desktop and 2.59s for mobile pages in the same year, while it took on average about 4.7s to load the overall page, with 11.9s for a mobile page .¶
Key here is that the DNS latency for resolving one URL may significantly accumulate due to the many objects a web page may include. While CDNs reduce page loading time, Internet-based resources (thus those not hosted by the local CDN), still require resolving the URL, often at significantly higher latency than the CDN-based resolver; with [OnOff2022] positioning Internet resources at more than 100ms to resolve through the DNS, while CDN-hosted resources may be resolved within 15 to 45ms.¶
We can observe the following pain points when realizing such scenario in today's available POP-based systems:¶
From the above, we can identify the explicit resolution step, requiring a lookup request with response, before the actual HTTP-based transfer may commence, as a key source for impacting the page retrieval time (we note that other aspects like client rendering and server performance are impacting the overall page loading time but this lies outside the scope of the discussions here).¶
With the above in mind, we postulate that an in-band signalling of URL to IP mapping requests may significantly reduce the overall page retrieval time, particularly for those scenarios in which no other traffic engineering methods, such as the careful balancing between CDN caches, is applied, as it is usual for popular sites.¶
In a preliminary evaluation of such in-band benefits, we positioned the in-band element, realizing the functionalities outlined in [I-D.trossen-rtgwg-rosa-arch] as the Service Access Router and the Service Access Gateway, at the CDN ingress. This enables access to ROSA-hosted resources as well as resources hosted by both the CDN and the wider Internet through the same CDN ingress point.¶
We assumed a client-CDN RTT of 20ms and we were able to show a reduction for up to 60% of page retrieval time in a simple model where a single page is being retrieved, followed by a parallelized retrieval of all objects included in the initial page. Further, the time-to-first-byte (i.e., the retrieval of the initial object of up to 14kB size) was reduced by up to 70% for CDN-hosted objects. Although those results are preliminary, they outline the potential that moving from explicit resolution to in-band resolution could bring.¶
Several observations can be drawn from the use case examples in the previous section in what concerns their technical needs:¶
We can conclude from our observations above that (i) distribution (of service instances), (ii) dynamicity in the availability of and choosing the 'best' service instance, and (iii) efficiency in utilizing the best possible service instance are crucial issues for our use cases.¶
This document presented a number of use cases for service-based routing. Common across all those use cases is the inherent need for a dynamic anycast decision, i.e., the frequent (re-)assignment of service instances among a set of possible service endpoints.¶
Additionally, this (re-)assignment is driven by service-specific policies that capture not just performance-oriented metrics but also possible user-centric interactions with other services, which are jointly composed towards a larger, chained experience.¶
Existing methods, such as DNS, Alto, and others, already handle the (re-)assignment between service name and routing locator. For this, they employ an out-of-band resolution step, initiated by the client in relation to whatever service the client may want to use and resulting in returning the chosen IP address to the client, after which the latter initiates a direct communication with the now resolved IP address of the chosen service instance. This method has been well proven for the many services as they exist in the Internet today.¶
However, we must also note that those resolution steps incur explicit resolution latencies that add to the end-to-end communication between client and service instance. Furthermore, solution-specific lags may exist in updating the name-locator assignments, while each resolution solution supports its specific application identifier domain, such as domain names (DNS), URLs (ALTO) or others. In our use cases, these issues, together with others, cause problems to the realization and performance of the use cases and/or the user experience they set out to offer.¶
WHAT IF a similar end-to-end procedure of data communication between a client and a 'best' choice of service instances (out of set of possibly many) existed that significantly reduced the aforementioned latency, while it allowed for updating the assignments at rates that are more aligned with the possibility to establish new service instances in distributed locations?¶
We assert that the following problems need to be addressed in providing such improved procedure:¶
We argue that existing solutions do not provide adequate answers to the above problems, which we will separately deepen in our separate gap analysis, leading us to formulate requirements for possible answers in the same draft, with a first proposal for a solution framework and architecture in a separate document.¶
Flexible and even highly dynamic service-based routing is key for a number of emerging and existing use cases, as we outlined in this draft.¶
As we outlined with a range of use cases, there exist a number of issues when realizing those use cases, leading us to formulate a problem statement for needed work in the IETF to identify adequate answers. In our companion documents, we present our current understanding on the shortcomings of existing solutions to SBR, together with requirements for a possible improved answer to those problems.¶
To facilitate the decision between service information (i.e., the service address) and the IP locator of the selected service instance, information needs to be provided to the ROSA service address routers. This is similar to the process of resolving domain names to IP locators in today's solutions, such as the DNS. Similar to the latter techniques, the preservation of privacy in terms of which services the initiating client is communicating with, needs to be preserved against the traversing underlay networks. For this, suitable encryption of sensitive information needs to be provided as an option. Furthermore, we assume that the choice of ROSA overlay to use for the service to locator mapping is similar to that of choosing the client-facing DNS server, thus is configurable by the client, including to fall back using the DNS for those cases where services may be announced to ROSA methods and DNS-like solutions alike.¶
This draft does not request any IANA action.¶
Many thanks go to Ben Schwartz, Mohamed Boucadair, Tommy Pauly, Joel Halpern, Daniel Huang, Peng Liu, Hannu Flinck, and Russ White for their comments to the text to clarify several aspects of the motiviation for and technical details of ROSA.¶
Johann Schoepfer Email: johann.alban.schoepfer@gmail.com Emilia Ndilokelwa Weyulu Email: emilia.ndilokelwa.weyulu@huawei.com¶