1. Introduction
This document defines a simple API for browsers that enables the collection of aggregated, differentially-private metrics.
The primary goal of this API is to enable attribution for advertising.
1.1. Attribution
In advertising, attribution is the process of identifying actions that precede an outcome of interest, and allocating value to those actions.
Actions that are of interest to advertisers are primarily the showing of advertisements (also referred to as impressions). Other actions include ad clicks (or other interactions) and opportunities to show ads that were not taken.
Desired outcomes for advertising are more diverse, as they include any result that an advertiser seeks to improve through the showing of ads. A desirable outcome might also be referred to as a conversion, which refers to "converting" a potential customer into a customer. What counts as a conversion could include sales, subscriptions, page visits, and enquiries.
For this API, actions and outcomes are both events: things that happen once. What is unique about attribution for advertising is that these events might not occur on the same site. Advertisements are most often shown on sites other than the advertiser’s site.
The primary challenge with attribution is in maintaining privacy. Attribution involves connecting activity on different sites. The goal of attribution is to find an impression that was shown to the same person before the conversion occurred.
If attribution information were directly revealed, it would enable unwanted cross-context recognition, thereby enabling tracking.
This document avoids cross context recognition by ensuring that attribution information is aggregated using an aggregation service. The aggregation service is trusted to compute an aggregate without revealing the values that each person contributes to that aggregate.
Strict limits are placed on the amount of information that each browser instance contributes to the aggregates for a given site. Differential privacy is used to provide additional privacy protection for each contribution.
Details of aggregation service operation is included in § 6 Aggregation. The differential privacy design used is outlined in § 7 Differential Privacy.
1.2. Background
From the early days of the Web, advertising has been widely used to financially support the creation of sites.
One characteristic that distinguished the Web from other venues for advertising was the ability to obtain information about the effectiveness of advertising campaigns.
Web advertisers were able to measure key metrics like reach (how many people saw an ad), frequency (how often each person saw an ad), and conversions (how many people saw the ad then later took the action that the ad was supposed to motivate). In comparison, these measurements were far more timely and accurate than for any other medium.
The cost of measurement performance was privacy. In order to produce accurate and comprehensive information, advertising businesses performed extensive tracking of the activity of all Web users. Each browser was given a tracking identifier, often using cookies that were lodged by cross-site content. Every action of interest was logged against this identifier, forming a comprehensive record of a person’s online activities.
Having a detailed record of a person’s actions allowed advertisers to infer characteristics about people. Those characteristics made it easier to choose the right audience for advertising, greatly improving its effectiveness. This created a strong incentive to gather more information.
Online advertising is intensely competitive. Sites that show advertising seek to obtain the most money for each ad placement. Advertisers seek to place advertising where it will have the most effect relative to its cost. Any competitive edge gained by these entities—and the intermediaries that operate on their behalf—depends on having more comprehensive information about a potential audience.
Over time, actions of interest expanded to include nearly every aspects of online activity. Methods were devised to correlate that information with activity outside of the Web. An energetic trade has formed, with multiple purveyors of personal information that is traded for various purposes.
1.3. Goals
The goal of this document is to define a means of performing attribution for advertising that does not enable tracking.
1.4. End-User Benefit
The measurement of advertising performance creates new cross-site flows of information. That information flow creates a privacy risk or cost—of cross-context recognition—that needs to be justified in terms of benefits to end users.
Any benefits realized by end users through the use of attribution is indirect.
End users that visit a website pay for "free" content or services primarily through their attention to any advertisements the site shows them. This "value" accrues to the advertiser, who in turn pays the site. The site is expected to use this money to support the provision of their content or services.
Participation in an attribution measurement system would comprise a secondary cost to Web users.
Support for attribution enables more effective advertising, largely by informing advertisers about what ads perform best, and in what circumstances. Those circumstances might include the time and place that the ad is shown, the person to whom the ad is presented, and the details of the ad itself.
Connecting that information to outcomes allows an advertiser to learn what circumstances most often lead to the outcomes they most value. That allows advertisers to spend more on effective advertising and less on ineffective advertising. This lowers the overall cost of advertising relative to the value obtained. [ONLINE-ADVERTISING]
Sites that provide advertising inventory, such as content publishers and service providers, indirectly benefit from more efficient advertising. Venues for advertising that are better able to show ads that result in the outcomes that advertisers seek can charge more for ad placements.
Sites that obtain support through the placement of advertisements are better able to provide quality content or services. Importantly, that support is derived unevenly from their audience. This can be more equitable than other forms of financial support. Those with a lower tendency or ability to spend on advertised goods obtain the same ad-supported content and services as those who can afford to pay. [EU-AD][COPPACALYPSE]
The ability to supply "free" services supported by advertising has measurable economic benefit that derives from the value of those services. [FREE-GDP]
1.5. Collective Privacy Effect
The use of aggregation—if properly implemented—ensures that information provided to sites is about groups and not individuals.
The introduction of this mechanism therefore represents collective decision-making, as described in Privacy Principles § collective-privacy.
Participation in attribution measurement carries a lower privacy cost when the group that participates is larger. This is due to the effect of aggregation on the ability of sites to extract information about individuals from aggregates. This is especially true for central differential privacy, which is the mathematical basis for the privacy design used in this specification.
Larger cohorts of participants also produce more representative—and therefore more useful—statistics about the advertising that is being measured.
If attribution is justified, both these factors motivate the enablement of attribution for all users.
Acting to enable attribution measurement by user agents will not be positively received by some people. Different people perceive the costs and benefits that come from engaging with advertising differently. The proposed design allows people the option of appearing to participate in attribution without revealing that choice to sites; see § 3.7.1 Optional Participation.
1.6. Attribution Using Histograms
Attribution attempts to measure correlation between one or more ad placements (impressions) and the outcomes that an advertiser desires.
When considered in the aggregate, information about individuals is not useful. Actions and outcomes need to be grouped.
The simplest form of attribution splits impressions into a number of groupings according to the attributes of the advertisement and counts the number of conversions. Groupings might be formed from attributes such as where the ad is shown, what was shown (the "creative"), when the ad was shown, or to whom.
These groupings and the tallies of conversions attributed to each form a histogram. Each bucket of the histogram counts the conversions for a group of ads.
Different groupings might be used for different purposes. For instance, grouping by creative (the content of an ad) might be used to learn which creative works best.
Adding a value greater than one at each conversion enables more than simple counts. Histograms can also aggregate values, which might be used to differentiate between different outcomes. The value that is allocated to impressions is called a conversion value. A higher conversion value might be used for larger purchases or any outcome that is more highly-valued. A conversion value might also be split between multiple impressions to split credit, though this capability is not presently supported in the API.
-
Compatibility with privacy-preserving aggregation services
-
Flexibility to assign buckets
-
As histogram size increases, noise becomes a problem
2. Overview of Operation
The private attribution API provides aggregate information about the association between two classes of events: impressions and conversions.
An impression is any action that an advertiser takes on any website. The API does not constrain what can be recorded as an impression. Typical actions that an advertiser might seek to measure include:
-
Displaying an advertisement.
-
Having a user interact with an advertisement in some way.
-
Not displaying an advertisement (especially for controlled experiments that seek to confirm whether an advertising campaign is effective).
For the API, a conversion is an outcome that is being measured. The API does not constrain what might be considered to be an outcome. Typical outcomes that advertisers might seek to measure include:
-
Making a purchase.
-
Signing up for an account.
-
Visiting a webpage.
The remainder of this section describes how the Private Attribution API operates in conjunction with an aggregation service to produce an aggregate attribution measurement. That operation is illustrated in the following figure.
When an impression occurs, the saveImpression() method can be used to request that the browser save information. This includes an identifier for the impression and some additional information about the impression. For instance, advertisers might use additional information to record whether the impression was an ad view or an ad click.
At conversion time, a conversion report is created. A conversion report is an encrypted histogram contribution that includes information from any impressions that the browser previously stored.
The measureConversion() method accepts a simple query that is used to tell the browser how to construct a conversion report. That includes a simple query that selects from the impressions that the browser has stored, a conversion value that is allocated to the selected impression(s), and other information needed to construct the conversion report.
The histogram created by the conversion report is constructed as follows:
-
If the query found no impressions, or the privacy budget for the site is exhausted, a histogram consisting entirely of zeros (0) is constructed.
-
If one or more matching impressions is found, the browser runs the attribution logic (default last-touch) to select the most recent impression. The provided conversion value is added to a histogram at the bucket that was specified at the time of the attributed impression. All other buckets are set to zero.
The browser updates the privacy budget store to reflect the reported conversion.
The resulting histogram is prepared for aggregation according to the requirements of the chosen aggregation service and returned to the site. This minimally involves encryption of the histogram.
A site that invokes this API will always receive a valid conversion report. As a result, sites learn nothing about what happened on other sites from this interaction.
The site can collect the encrypted histograms it receives from calls to this API and submit them to the aggregation service.
Upon receiving a set of encrypted histograms from a site, the aggregation service:
-
confirms that it has not previously computed an aggregate from the provided inputs and that there are enough conversion reports,
-
adds the histograms including sufficient noise to produce a differentially-private aggregate histogram, and
-
returns the aggregate to the site.
3. API Details
A site using the Private Attribution API will typically register either impressions or conversions, but in some cases the same site may do both.
To register an impression, a site calls saveImpression(). No preparation is required to use this API beyond collecting parameter values, although it may be useful to examine the supported aggregationServices in deciding whether to use the Private Attribution API.
To request a conversion report, a site calls measureConversion().
Before calling this API, a site must
select a supported aggregation service.
The page may select any of the supported services found in aggregationServices.
The name of the selected service must be supplied as
the aggregator
member of the PrivateAttributionConversionOptions
dictionary when calling the measureConversion() method.
3.1. Finding a Supported Aggregation Service
The aggregationServices
attribute
contains a list of aggregation services supported by the user agent. The page
must select and specify one of these services when calling the measureConversion() method.
It may also be useful to query the supported services
before registering an impression,
but that is not required,
and impressions are not scoped to a single aggregation service.
enum PrivateAttributionProtocol {"dap-12-histogram" ,"tee-00" };dictionary {
PrivateAttributionAggregationService required DOMString url ;required DOMString protocol ; }; [SecureContext ,Exposed =Window ]interface {
PrivateAttributionAggregationServices readonly setlike <PrivateAttributionAggregationService >; }; [SecureContext ,Exposed =Window ]interface {
PrivateAttribution readonly attribute PrivateAttributionAggregationServices ; };
aggregators
The aggregationServices attribute contains the following information about each supported aggregation service:
url
, of type DOMString- A URL that identifies an aggregation service.
This value is passed as the
aggregator
parameter to measureConversion() to select the identified aggregation service. protocol
, of type DOMString- The
protocol
that the aggregation service uses. Different versions of the same protocol use different values. Even if a single service provider supports multiple protocols, each needs to use a different URL. This ensures that each can be uniquely identified by URL without also specifying the choice of protocol.
The PrivateAttributionProtocol
describes the submission protocol
used by different aggregation services. This document defines two protocols:
dap-12-histogram
- A DAP-based protocol [DAP] that uses MPC; see § 6.1 Multi-Party Computation Aggregation.
tee-00
- A protocol for submission to a TEE; see § 6.2 Trusted Execution Environments.
3.2. Saving Impressions
The saveImpression()
method requests
that the user agent record an impression in the impression store.
navigator.privateAttribution.saveImpression({ histogramIndex: 3, filterData: 2, conversionSite: "advertiser.example", lifetimeDays: 7, });
dictionary {
PrivateAttributionImpressionOptions required unsigned long histogramIndex ;required unsigned long filterData ;required DOMString conversionSite ;unsigned long lifetimeDays = 30; }; [SecureContext ,Exposed =Window ]partial interface PrivateAttribution { [Throws ]undefined (
saveImpression PrivateAttributionImpressionOptions ); };
options
The arguments to saveImpression() are as follows:
histogramIndex
, of type unsigned long- If measureConversion() matches this impression with a subsequent conversion, the conversion value will be added to the histogram bucket identified by this index.
filterData
, of type unsigned long- An optional piece of metadata associated with the impression. The filterData can be used to identify which impressions may receive attribution from a conversion.
conversionSite
, of type DOMString- The site where conversions for this impression may occur, identified by its domain name. The measureConversion() method will only attribute to this impression when called by the indicated site.
lifetimeDays
, of type unsigned long, defaulting to30
- A "time to live" (in days) after which the impression can no longer receive attribution. If not specified, the default is 30 days. The user agent should impose an upper limit on the lifetime, and silently reduce the value specified here if it exceeds that limit.
3.2.1. Operation
-
Collect the implicit API inputs:
-
The current timestamp
-
The impression site domain
-
The iframe site domain
-
-
Validate the page-supplied API inputs
-
If the private attribution API is enabled, save the impression to the impression store.
saveImpression() does not return a status indicating whether the impression was recorded. This minimizes the ability to detect when the Private Attribution API is [[#opt-out|disabled].
3.3. Requesting Attribution for a Conversion
The measureConversion()
method
requests that the user agent perform attribution for a conversion,
and return a conversion report.
The measureConversion() method always returns a conversion report, regardless of whether matching impression(s) are found. If there is no match, or if differential privacy disallows reporting the attribution, the returned conversion report will not contribute to the histogram, i.e., will be uniformly zero.
navigator.privateAttribution.measureConversion({ // name of the aggregation service aggregator: "aggregator.example", // the number of buckets in the histogram histogramSize: 20, // the amount of privacy budget to use epsilon: 1, // the attribution logic to use logic: "last-touch", // the value to assign to the histogram index of the impression value: 3, // the maximum value which can be generated across all reports included in the aggregation // used together with epsilon to calibrate the differential privacy budget to use maxValue: 5, // only consider impressions within the last N days lookbackDays: 30, // an optional filter to restrict the set of ads that can be attributed filterData: 2, // an optional list of sites where impressions might have been registered impressionSites: ["publisher.example"], // an optional list of sites which called the saveImpression API intermediarySites: ["ad-tech.example"], });
dictionary {
PrivateAttributionConversionOptions required DOMString aggregator ;double epsilon = 1.0;required unsigned long histogramSize ;PrivateAttributionLogic logic = "last-touch";unsigned long value = 1;unsigned long maxValue = 1;unsigned long lookbackDays ;unsigned long filterData ;sequence <DOMString >impressionSites = [];sequence <DOMString >intermediarySites = []; }; [SecureContext ,Exposed =Window ]partial interface PrivateAttribution { [Throws ]Promise <Uint8Array >(
measureConversion PrivateAttributionConversionOptions ); };
options
The arguments to measureConversion() are as follows:
aggregator
, of type DOMString- A selection from the aggregation services that can be found in aggregators.
epsilon
, of type double, defaulting to1.0
- The amount of privacy budget to expend on this conversion report.
histogramSize
, of type unsigned long- The number of histogram buckets to use in the conversion report.
logic
, of type PrivateAttributionLogic, defaulting to"last-touch"
- A selection from PrivateAttributionLogic indicating the attribution logic to use.
value
, of type unsigned long, defaulting to1
- The conversion value. If an attribution is made and privacy restrictions are satisfied, this value will be encoded into the conversion report.
maxValue
, of type unsigned long, defaulting to1
- The maximum conversion value across all contributions included in the aggregation. Together with epsilon, this is used to calibrate the distribution of random noise that will be added to the outcome. It is also used to determine the amount of privacy budget to expend on this conversion report.
lookbackDays
, of type unsigned long- An integer number of days. Only impressions occurring within the past
lookbackDays
may match this conversion. filterData
, of type unsigned long- Only impressions having a filterData value matching this value will be eligible to match this conversion.
impressionSites
, of type sequence<DOMString>, defaulting to[]
- A list of impression sites. Only impressions recorded where the top-level site is on this list are eligible to match this conversion.
intermediarySites
, of type sequence<DOMString>, defaulting to[]
- A list of sites which called the saveImpression() API. Only impressions recorded by scripts originating from one of the intermediary sites are eligible to match this conversion.
3.3.1. Operation
-
Collect the implicit API inputs
-
The current timestamp
-
The conversion site domain
-
The iframe site domain
-
-
Validate the page-supplied API inputs
-
If logic is specified, and the value is anything other than "last-touch", return an error.
-
-
If the private attribution API is enabled, invoke the routine to fill a histogram using last-touch attribution.
-
Encrypt the report.
-
Return the encrypted report.
3.4. Impression Store
The impression store is used by the measureConversion() method to find matching impressions.
3.4.1. Contents
The impression store must store the following information:
Filter Data | The filterData value passed to saveImpression().
|
---|---|
Impression Site | The site that called saveImpression(). |
Intermediary Site | The site corresponding to the script that called saveImpression(). |
Conversion Sites | The conversion site(s) that were passed to saveImpression(). |
Timestamp | The time at which saveImpression() was called. |
Lifetime | The number of days an impression remains eligible for attribution, either from the call to saveImpression(), or a user agent-defined limit. |
Histogram Index | The histogram index passed to saveImpression(). |
3.4.2. Maintenance
The user agent should periodically use the timestamp and lifetime values to identify and delete any impressions in the impression store that have expired.
It is not necessary to remove impressions immediately upon expiry, as long as measureConversion() excludes expired impressions from attribution. However, the user agent should not retain expired impressions indefinitely.
3.4.3. Clearing
A mechanism must be provided to clear the impression store. For example, the impression store could be cleared upon activation of the control that disables the Private Attribution API. It is recommended that any mechanism a user agent provides to clear stored browsing data (history, cookies, etc.) be extended to cover the impression store.
3.5. Privacy Budget Store
The privacy budget store records the state of the per-site privacy budgets, and of any safety limits. It is updated by deduct privacy budget.
The privacy budget store needs to be described in more detail. Some references to clearing the impression store may need to be updated to refer to the privacy budget store as well.
3.6. Attribution Logic
A site that measures conversions can specify attribution logic, which determines how the conversion value is allocated to histogram buckets. The measureConversion() function accepts a logic parameter that specifies the attribution logic.
enum {
PrivateAttributionLogic "last-touch" , };
Each attribution logic specifies a process for allocating values to histogram buckets. This logic includes how to select impressions, how to handle weeks in which the privacy budget is insufficient, and (optionally) how to process any additional parameters that might be used.
3.6.1. Last Touch Attribution
The "last-touch"
attribution logic indicates that the browser should select
the last (most recent) impression that matches the common matching logic.
The entire conversion value (up to the maximum imposed by the privacy budget)
is allocated to the histogram bucket that was saved with the impression.
Last touch attribution does not select any impression that was saved during a week that does not have sufficient privacy budget. If impressions match from a week that does not have enough privacy budget, impressions are not matched for any preceding weeks. That is, once a week has a matching impression and insufficient budget, the process will set a value of zero for all histogram buckets.
To fill a histogram using last-touch attribution, given options:
-
Initialize impression to a null value.
-
Initialize value to options.
value
. -
Let now be the current time.
-
For each week starting from the current week to the oldest week supported by the user agent:
-
Let impressions be the result of invoking common matching logic with options, week, and now.
-
If impressions is not empty:
-
Retain the value of week.
-
Set impression to the value in impressions with the most recent impression.timestamp.
-
Exit the loop.
-
-
-
If impression is null, let budgetOk be false.
-
Otherwise, let budgetOk be the result of deduct privacy budget with week and options.
epsilon
. -
If budgetOk is false, set value to 0.
-
If impression.histogramIndex is options.
histogramSize
or greater, set value to 0. -
If value is not 0, set index to impression.
histogramIndex
. -
Otherwise, set index to 0.
-
Return a histogram containing options.
histogramSize
values, with a value of value at an index of index and a value of zero at all other indices.
3.6.2. Common Impression Matching Logic
TODO specify how to match using "lookbackDays", "filterData" and "impressionSites".
Discuss "infinite" lookbackDays. Clarify when it apples. When field is missing? Zero?
To perform common matching logic, given options, week, and moment now:
-
If number of days since the end of week exceeds lookbackDays, return an empty set.
-
Initialize matching to an empty set.
-
For each impression in the saved impressions for the week:
-
If now - lookbackDays is after impression.timestamp, continue the loop.
-
If options.
filterData
does not match impression.filterData, continue the loop. -
If options.
impressionSites
does not contain impression.impressionSite, continue the loop. -
Add impression to matching.
-
-
Return matching.
3.7. User Control and Visibility
3.7.1. Optional Participation
-
Users should be able to opt out. Opt out should be undetectable.
Text fragment moved from privacy section:
This mechanism may be a dedicated control for the Private Attribution API, or it may be a consolidated privacy control that applies to multiple features, including private attribution. Further, user agent developers should consider interaction of other privacy modes with the Private Attribution API. For example, attribution might be disabled in a private browsing mode, or it might be disabled if the user has opted out of collection of diagnostic data.
3.7.2. Visibility
-
User ability to view the impression store and past report submissions.
4. Permissions Policy Integration
This specification defines two policy-controlled features:
-
Invocation of the saveImpression() API, identified by the string "
".save-impression
-
Invocation of the measureConversion() API, identified by the string "
".measure-conversion
The default allowlist for both of these features is *
.
Having separate permissions for saveImpression() and measureConversion() allows pages that do both to limit subresources to the expected kind of activity.
Enabling permissions by default simplifies the task of integrating external services.
Permissions policy provides only all-or-nothing control, it does not enable delegation of a portion of privacy budget.
5. Implementation Considerations
-
Management and distribution of values for the following:
-
Histogram size
-
Conversion site for impressions
-
Impression site for conversions
-
Ad IDs
-
6. Aggregation
An aggregation service takes multiple pieces of attribution information and produces an aggregate metric.
User agent implementations will have different requirements for aggregation. However, the aggregation process has some common elements.
Firstly, user agents will need to be configured with, or otherwise obtain, information about the aggregation service. This includes the aggregation methods that are supported and any configuration that is required.
Each aggregation method needs to define how a histogram is:
-
prepared for aggregation,
-
encrypted,
-
annotated with any necessary metadata, and
-
submitted to the aggregation service for aggregation.
The aggregation method also needs to define how the aggregated result is obtained by a site.
6.1. Multi-Party Computation Aggregation
A Multi-Party Computation (MPC) system is one that involves multiple independent entities that cooperatively compute an agreed function.
This specification uses an MPC system based on Prio [PRIO] and the Distributed Aggregation Protocol (DAP) [DAP]. This is a two-party MPC system that is characterized by its reliance on client-provided proofs of correctness for inputs. This allows for very efficient MPC operation at a modest cost in the size of submissions to the system.
An aggregator that uses Multi-Party Computation (MPC) comprises two or more independent services that cooperate to compute a predefined function.
The basic guarantee provided by MPC is that only the defined outputs of a function, plus well-defined leakage, is revealed to any entity.
The MPC guarantees hold only to the extent that a subset of the entities that participate are honest. For the two-party MPC used in Prio, privacy—that is, the confidentiality of inputs—is maintained as long as either MPC operator remains honest. This MPC configuration does not protect against the corruption of the outputs by either MPC operator.
6.1.1. Prio and DAP
The "dap-12-histogram" aggregation method uses Prio [PRIO] and the Distributed Aggregation Protocol (DAP) [DAP]. Specifically, this aggregation method uses the Prio3L1BoundSum instantiation [PRIO-L1] of the Prio3 Verifiable Distributed Aggregation Function (VDAF) [VDAF].
DAP and the Prio3L1BoundSum instantiation define how a report is prepared, encrypted, and submitted for aggregation. DAP also defines how an aggregate is obtained and what configuration is necessary for a user agent to obtain about the aggregation service.
Several extensions to DAP [DAP-EXT] are necessary for this application:
-
Late task binding improves the ability of a site to collect reports and aggregate them as needed.
-
Website identity is critical to ensure that differential privacy protections are effective. This prevents a malicious actor that is able to correlate user identity across multiple sites from exceeding the sensitivity bounds for that user by aggregating reports from multiple sites together.
-
Privacy budget consumption ensures that the aggregator does not aggregate reports that received less privacy budget than the aggregation task was configured with.
User agents need to include all of these extensions in reports that they generate.
6.2. Trusted Execution Environments
A Trusted Execution Environment (TEE) uses specialized hardware to ensure that computation is isolated from other programs that run on the same hardware.
TODO
6.3. Anti-Replay Requirements
Conversion reports generated by browsers are bound to the amount of privacy budget that was expended by the site that requested the report.
An aggregation service MUST guarantee that it does not accept the same report more than once.
7. Differential Privacy
This design uses the concept of differential privacy as the basis of its privacy design. [PPA-DP]
Differential privacy is a mathematical definition of privacy that can guarantee the amount of private information that is revealed by a system. [DP] Differential privacy is not the only means by which privacy is protected in this system, but it is the most rigorously defined and analyzed. As such, it provides the strongest privacy guarantees.
Differential privacy uses randomized noise to hide private data contributions to an aggregated dataset. The effect of noise is to hide individual contributions to the dataset, but to retain the usefulness of any aggregated analysis.
To apply differential privacy, it is necessary to define what information is protected. In this system, the protected information is the impressions of a single user profile, on a single user agent, over a single week, for a single website that registers conversions. § 7.1 Privacy Unit describes the implications of this design in more detail.
This attribution design uses a form of differential privacy called individual differential privacy. In this model, user agents are each separately responsible for ensuring that they limit the information that is contributed.
The individual differential privacy design of this API has three primary components:
-
User agents limit (using the privacy budget) the amount of information about impressions that leaves the device through conversion reports. § 7.2 Privacy Budgets explores this in greater depth.
-
Aggregation services ensure that any given conversion report is only used in accordance with the privacy budget that was accounted for it by the user agent. § 6.3 Anti-Replay Requirements describes requirements on aggregation services in more detail.
-
Noise is added by aggregation services. § 7.3 Differential Privacy Mechanisms details the mechanisms that might be used.
Together, these measures place limits on the information that is released for each privacy unit.
7.1. Privacy Unit
An implementation of differential privacy requires a clear definition for what is protected. This is known as the privacy unit, which represents the entity that receives privacy protection.
This system adopts a privacy unit that is the combination of three values:
-
A user agent profile. That is, an instance of a user agent, as used by a single person.
-
The site that requests information about impressions.
The sites that register impressions are not considered. Those sites do not receive information from this system directly.
-
The current week.
A change to any of these values produces a new privacy unit, which results in a separate privacy budget. Each site that a person visits receives a bounded amount of information for each week.
Ideally, the privacy unit is a single person. Though ideal, it is not possible to develop a useful system that guarantees perfect correspondance with a person, for a number of reasons:
-
People use multiple browsers and multiple devices, often without coordination.
-
A unit that covered all websites could be exhausted by one site, denying other sites any information.
-
Advertising is an ongoing activity. Without allocating privacy budget for new data, sites could exhaust their budget forever.
7.1.1. Browser Instances
Each browser instance manages a separate privacy budget.
Coordination between browser instances might be possible, but not expected. That coordination might allow privacy to be improved by reducing the total amount of information that is released. It might also improve the utility of attribution by allowing impressions on one browser instance to be converted on another.
Coordination across different implementations is presently out of scope for this work. Implementations can perform some coordination between instances that are known to be for the same person, but this is not mandatory.
7.1.2. Per-Site Limits
The information released to websites is done on the basis of site. This aligns with the same boundary used in other privacy-relevant functions.
A finer privacy unit, such as an origin, would make it trivial to obtain additional information. Information about the same person could be gathered from multiple origins. That information could then be combined by exploiting the free flow of information within the site, using cookies [COOKIES] or similar.
§ 7.2.2 Safety Limits discusses attacks that exploit this limit and some additional safety limits that might be implemented by user agents to protect against those attacks.
7.1.3. Privacy Budget Epochs
Sites receive a separate differential privacy budget for the data in every time internval called and epoch. The epoch length is one week.
This budget applies to the impressions that are registered with the user agent and later queried, not conversions.
From the perspective of the analysis [PPA-DP] each week of impressions forms a separate database. A finite privacy budget is enforced across all the queries made on each database.
Having a conversion report produced from impressions that span multiple weeks has privacy consequences. A single visit to a website can give that site information about activities across many weeks. This only requires that the conversion site is identified as the destination for impressions over that entire period. The number of weeks that can be queried are limited by user agents.
The goal is to set an epoch that is as large as feasible. A longer period of time allows for a better privacy/utility balance because sites can be allocated a larger overall budget at any point in time, while keeping the overall rate of privacy loss low. However, a longer interval means that it is easier to exhaust a privacy budget completely, yield no information until the next refresh.
The choice of a week is largely arbitrary. One week is expected to be enough to allow sites the ability to make decisions about how to spend privacy budgets without careful planning that needs to account for changes that might occur days or weeks in the future.
§ 7.2 Privacy Budgets describes the process for budgeting in more detail.
7.2. Privacy Budgets
Browsers maintain a privacy budget, which is a means of limiting the amount of privacy loss.
This specification uses an individual form of (ε, δ)-differential privacy as its basis. In this model, privacy loss is measured using the value ε. The δ value is handled by the aggregation service when adding noise to aggregates.
Each user agent instance is responsible for managing privacy budgets.
Each conversion report that is requested specifies an ε value that represents the amount of privacy budget that the report consumes and a max on the value that can be returned in the conversion report.
7.2.1. Privacy Budget Deduction
When searching for impressions for the conversion report, the user agent deducts the specified ε value from the budget for the week in which those impressions were saved. If the privacy budget for that week is not sufficient, the impressions from that week are not used.
The details of how to deduct privacy budget is given below ... WIP
A conversion report might be requested at the time marked with "now". That conversion report selects impressions marked with black circles, corresponding to impressions from Site B, C, and E.
As a result, privacy budgets for the querying site is deducted from weeks 1, 3, 4, and 5. No impressions were recorded for week 2, so no budget is deducted from that week.
How a user agent manages exhaustion of a privacy budget depends on the attribution logic that was specified.
7.2.2. Safety Limits
The basic privacy unit is vulnerable to attack by an adversary that is able to correlate activity for the same person across multiple sites.
Groups of sites can sometimes coordinate their activity, such as when they have shared ownership or strong agreements. A group of sites that can be sure that particular visitor is the same person—using any means, including something like FedCM [FEDCM]—can combine information gained from this API.
This can be used to increase the rate at which a site gains information from attribution, proportional to the number of sites across which coordination occurs. The default privacy unit places no limit on the information released in this way.
To counteract this effect, user agents can implement safety limits, which are additional privacy budgets that do not consider site. Safety limits might be significantly higher than per-site budgets, so that they are not reached for most normal browsing activity. The goal would be to ensure that they are only effective for intensive activity or when being attacked.
Like the per-site privacy budget, it is critical that sites be unable to determine whether their request for a conversion report has caused a safety limit to be exceeded.
7.3. Differential Privacy Mechanisms
The specific mechanisms that are used depend on the type of aggregation service.
8. Security Considerations
8.1. Impression Store
The impression store used by the Private Attribution API holds information related to browsing activity and persists across browsing sessions. Although the flow of information through the impression store is strictly controlled, it carries some amount of information across origins.
The following measures limit the possibility of harmful information flow through the impression store:
-
Websites cannot read from the impression store. Information from the impression store is released only via encrypted conversion reports. Differential privacy, provided by a combination of functionality in the user agent and in the aggregation service, provides a rigorous bound on the probability that the aggregated information output by the aggregation service is distinguishable from the value it would have absent any user’s contribution.
-
Users can explicitly clear the impression store.
-
It is recommended that user agents limit how long data can persist in the impression store, even absent expicit user action, by imposing a maximum value of lifetimeDays.
8.2. API Implementation
The Private Attribution APIs must be implemented carefully to maintain the required security and privacy properties. A site calling the APIs must not be able to learn:
-
Whether the Private Attribution APIs are enabled.
-
Whether an attribution occurred.
-
Whether the privacy budget is exhausted.
-
Whether the conversion report reflects a non-zero conversion value.
-
Which histogramIndex is assigned the conversion value.
Note that explicit return values or thrown exceptions are not the only way that a site can learn from the Private Attribution APIs. It may be possible to infer sensitive information from side channels like:
-
Variation in the time it takes for the APIs to complete.
-
Consumption of memory or storage by the API, if that consumption is somehow observable by the site.
While complete elimination of all side channels is impractical, implementations must make reasonable efforts to prevent leakage of sensitive information from the attribution APIs. Strategies to prevent leakage include:
-
Fully validating all API inputs, even when the API is disabled.
-
Avoiding conditional logic. For example, measureConversion() should always go through the full process of constructing a conversion report, even when the conversion value to be reported is zero.
8.3. Aggregation Services
Although not part of the web platform, security of aggregation services is quite important to the overall security of the Private Attribution mechanism. Conversion reports produced by measureConversion() are encrypted to cryptographic key(s) of the aggregation service. Thus, much of the potential for disclosure of the information contained in these reports depends on the details of the aggregation service.
User agent developers should carefully consider the design of an aggregation service and the trustworthiness of the aggregation service operator before adding it as a supported service for the Private Attribution API. Additional discussion of these issues may be found in § 6 Aggregation and § 9 Privacy Considerations.
8.4. Combining Reports from Multiple Sites
The privacy mechanisms in the Private Attribution API operate primarily at the granularity of sites. A malicious operator may attempt to register impressions for multiple sites, thus exceeding the amount of information that would otherwise be released through private attribution. § 7.2.2 Safety Limits discusses establishing additional cross-site privacy budgets to mitigate this possibility.
Rate limits on calls to the Private Attribution APIs could also be an effective mechanism to prevent harvesting information through overuse of the APIs.
8.5. Ad Fraud
As with many technologies, advertising on the web has been the subject of various kinds of fraud.
Fraudulent registration of impressions is a particular concern with the Private Attribution API, because impressions are stored only on the device. It is not possible to apply server-side intelligence to identify fraudulent impressions and exclude them from attribution. Conversely, even though conversion reports are encrypted, because the reports are sent to a server, the server can make a determination that the conversion is likely fraudulent and exclude it from aggregation.
An important mitigation against malicious use of the Private Attribution APIs is the explicit specification of eligible conversion sites when registering an impression, and of eligible impression sites and ad IDs when registering a conversion. This prevents impressions on arbitrary malicious sites from interfering with attribution to the intended set of candidate impressions.
9. Privacy Considerations
9.1. Information Exposed by the Private Attribution API
The impression store and privacy budget store contain information about a cross-section of browsing activity. As use of the API increases, so does the scope of this information. However, most of the information written to these stores is never disclosed. Because attribution is performed on the device (on-device attribution), only information about attributed conversions is exposed by the Private Attribution API. This contrasts with other schemes in which information about both impressions and conversions is sent to the aggregation service for off-device attribution. In the latter class of schemes, the amount of information that could be revealed in a compromise of the aggregation service (or in a compromise of communication with the aggregation service) is significantly larger.
When the Private Attribution API makes an attribution, information about that attribution is released from the device only to the extent the differential privacy restrictions allow.
While the Private Attribution API is intended to measure the association of relatively infrequent conversion events with a limited set of related impression candidates, it is important to consider how the API might be misused for larger-scale data collection. The requirement that impressions enumerate the possible conversion sites (and vice-versa) has an important role in preventing misuse of the API for mass data collection, and in making attempts at such misuse more visible.
It is unclear whether the privacy budget store should be cleared whenever the impression store is cleared. On one hand, it contains information about browsing activity, so is desirable to include it when clearing browsing activity. On the other hand, it is only possible to strictly adhere to the requirements of the differential privacy mechanism, if information about a fully- or partially- depleted privacy budget is maintained until that budget is no longer relevant (i.e. the end of the week).
9.2. Disabling the Private Attribution API
The Private Attribution API is designed to reveal only aggregate information. The use of differential privacy limits the chance of determining whether any particular user contributed to the aggregated output. However, some users may still prefer not to participate in attribution measurement. As discussed in § 3.7.1 Optional Participation, the user agent must provide a mechanism for the user to disable the Private Attribution API.
To minimize the risk of fingerprinting, and to prevent discrimination against users who choose to disable the Private Attribution API, sites must not be able to detect that the API is disabled. Specifically, all calls to the Private Attribution API that are otherwise valid, must complete successfully, even when the API is disabled. The only difference in behavior is that conversion reports returned when the API is disabled will never report any conversion value. Because the reports are encrypted, this difference cannot be detected by the site receiving the conversion report.
9.3. Use in Third-party Contexts
The Private Attribution API is available even in third-party contexts. In particular, a third-party iframe may call saveImpression(). Note, however, that the impression is recorded with the site of the top-level navigation context, not the origin of the iframe.
While the availability of the API in third-party contexts carries some increase in privacy risk, this support is deemed necessary because iframes are commonly used to display advertisements.
10. Acknowledgements
This specification is the result of a lot of work from many people. The broad shape of this level of the API is based on an idea from Luke Winstrom. The privacy architecture is courtesy of the authors of [PPA-DP].