Internet-Draft | Data Minimization | July 2023 |
Arkko | Expires 11 January 2024 | [Page] |
Data minimization is an important privacy technique, as it can reduce the amount information exposed about a user. This document emphasizes the need for data minimization among primary protocol participants, such as between clients and servers. Avoiding data leakage to outside parties is of course very important as well, but both need to be considered in minimization.¶
This is because is necessary to protect against endpoints that are compromised, malicious, or whose interests simply do not align with the interests of users. It is important to consider the role of a participant and limit any data provided to it according to that role.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 January 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Privacy been at the center of many activities in the IETF. Privacy and its impact on protocol development activities at IETF is discussed in [RFC6973], covering a number of topics, from understanding privacy threats to threat mitigation, including data minimization.¶
This document emphasizes the need for data minimization among primary protocol participants, such as between clients and servers. Avoiding data leakage to outside parties such as observers or attackers is of course very important as well, but minimization needs to consider both.¶
As RFC 6973 states:¶
"Limiting the data collected by protocol elements to only what is necessary (collection limitation) is the most straightforward way to help reduce privacy risks associated with the use of the protocol."¶
This document offers some further discussion, recommendations, and clarifications for this. This document suggests that limiting the sharing of data to the protocol participants is a key technique in limiting the data collection mentioned above. It is important that minimization happens prior to disclosing information to another party, rather than relying on the good will of the other party to avoid storing the information.¶
This is because is necessary to protect against endpoints that are compromised, malicious, or whose interests simply do not align with the interests of users. It is important to consider the role of a participant and limit any data provided to it according to that role.¶
Even closed, managed networks may have compromised nodes, justifying careful consideration of what information is provided to different nodes in the network. And in all networks, increased use of communication security means adversaries may resort to new avenues of attack. New adversaries and risks have also arisen, e.g., due to increasing amount of information stored in various Internet services. And in situations where interests do not align across the protocol participants, limiting data collection by a protocol participant itself - who is interested in data collection - may not be sufficient.¶
Careful control of information is also useful for technology evolution. For instance, allowing a party to unnecessarily collect or receive information may lead to a similar effect as described in [RFC8546] for protocols: regardless of initial expectations, over time unnecessary information will get used, leading to, for instance, ossification. Systems end up depend on having access to exactly the same information as they had access to previously. This makes it hard to change what information is provided or how it is provided.¶
The Principle of Least Privilege [PoLP] is applicable:¶
"Every program and every user of the system should operate using the least set of privileges necessary to complete the job."¶
In this context, it is recommended that the protocol participants minimize the information they share. I.e., they should provide only the information to each other that is necessary for the function that is expected to be performed by the other party.¶
Information sharing may relate to different types of protocol exchanges, e.g., interaction of an endpoint with outsiders, the network, or intermediaries.¶
Other documents address aspects related to networks ([RFC8546], [RFC8558], [I-D.iab-path-signals-collaboration]). Thomson [I-D.thomson-tmi] discusses the role intermediaries. Communications security largely addresses observers and outsider adversaries, see for instance [Confidentiality], [RFC7858], [RFC8446], [RFC8484], [RFC9000]. And [RFC6973] discusses associated traffic analysis threats.¶
The focus in this document is on the primary protocol participants, such as a server in a client-server architecture or a service enables some kind of interaction among groups of users.¶
As with communication security, we try to avoid providing too much information as it may be misused or leak through attacks. The same principle applies not just to routers and potential attackers on path, but also many other services in the Internet, including servers that provide some function.¶
The use of identifiers has been extensively discussed in [RFC6973],¶
Note that indirectly inferred information can also end up being shared, such as message arrival times or patterns in the traffic flow ([RFC6973]). Information may also be obtained from fingerprinting the protocol participants, in an effort to identify unique endpoints or users. Information may also be combined from multiple sources, e.g., websites and social media systems collaborating to identify visiting users [WP2021].¶
The most straightforward approach is of course to avoid sending a particular piece of information at all.¶
Or the information needs to be encrypted to very specific recipients, even if the encrypted message is shared with a broader set of protocol participants. For instance, a client can encrypt a message only to the actual final recipient, even if the server holds the message before it is delivered.¶
Architectural note: A transport connection between two components of a system is not an end-to-end connection even if it encompasses all the protocol layers up to the application layer. It is not end-to-end, if the information or control function it carries extends beyond those components. Just because an e-mail server can read the contents of an e-mail message do not make it a legitimate recipient of the e-mail.¶
This document recommends that information should not be disclosed, stored, or routed in cleartext through services that do not need to have that information for the function they perform.¶
Where the above methods are not possible due to the information being necessary for a function that the user wishes to be performed, there are still methods to set limits on the information sharing.¶
Kühlewind et al discuss the concept of Privacy Partititioning [I-D.iab-privacy-partitioning]. This may involve designs where no single party has all information such as with Oblivious DNS [I-D.annee-dprive-oblivious-dns], [I-D.pauly-dprive-oblivious-doh] or HTTP [I-D.ietf-ohai-ohttp], cryptographic designs where a service such as with the recent IETF PPM effort [I-D.ietf-ppm-dap], and so on.¶
Of course, participants may provide more information to each other after careful consideration, e.g., information provided in exchange of some benefit, or to parties that are trusted by the participant.¶
The general topic of ensuring that protocol mechanisms stays evolvable and workable is covered in [I-D.iab-use-it-or-lose-it]. But the associated methods for reducing fingerprinting possibilities probably deserve further study [Fingerprinting] [AmIUnique]. [I-D.wood-pearg-website-fingerprinting] discusses one aspect of this.¶
The author would like to thank the participants of various IAB workshops and programs, and IETF discussion list contributors for interesting discussions in this area. The author would in particular like to acknowledge the significant contributions of Martin Thomson, Nick Doty, Alissa Cooper, Stephen Farrell, Mark McFadden, John Mattsson, Chris Wood, Dominique Lazanski, Eric Rescorla, Russ Housley, Robin Wilton, Mirja Kühlewind, Tommy Pauly, Jaime Jiménez and Christian Huitema.¶
This work has been influenced by [RFC6973], [RFC8980], [I-D.farrell-etm] [I-D.arkko-arch-internet-threat-model-guidance], [I-D.lazanski-smart-users-internet],¶