Privacy-Preserving Attribution: Level 1

Draft Community Group Report,

More details about this document
This version:
https://private-attribution.github.io/api/
Issue Tracking:
GitHub
Inline In Spec
Editors:
(Mozilla)
(Mozilla)
(Meta)

Abstract

This specifies a browser API for the measurement of advertising performance. The goal is to produce aggregate statistics about how advertising leads to conversions, without creating a risk to the privacy of individual web users. This API collates information about people from multiple web origins, which could be a significant risk to their privacy. To manage this risk, the information that is gathered is aggregated using an aggregation service that is trusted by the user-agent to perform aggregation within strict limits. Noise is added to the aggregates produced by this service to provide differential privacy. Websites may select an aggregation service from the list of approved aggregation services provided by the user-agent.

Status of this document

This specification is a proposal that is intended to be migrated to the W3C standards track. It is not a standard.

1. Introduction

This document defines a simple API for browsers that enables the collection of aggregated, differentially-private metrics.

The primary goal of this API is to enable attribution for advertising.

1.1. Attribution

In advertising, attribution is the process of identifying actions that precede an outcome of interest, and allocating value to those actions.

Actions that are of interest to advertisers are primarily the showing of advertisements (also referred to as impressions). Other actions include ad clicks (or other interactions) and opportunities to show ads that were not taken.

Desired outcomes for advertising are more diverse, as they include any result that an advertiser seeks to improve through the showing of ads. A desirable outcome might also be referred to as a conversion, which refers to "converting" a potential customer into a customer. What counts as a conversion could include sales, subscriptions, page visits, and enquiries.

For this API, actions and outcomes are both events: things that happen once. What is unique about attribution for advertising is that these events might not occur on the same site. Advertisements are most often shown on sites other than the advertiser’s site.

The primary challenge with attribution is in maintaining privacy. Attribution involves connecting activity on different sites. The goal of attribution is to find an impression that was shown to the same person before the conversion occurred.

If attribution information were directly revealed, it would enable unwanted cross-context recognition, thereby enabling tracking.

This document avoids cross context recognition by ensuring that attribution information is aggregated using an aggregation service. The aggregation service is trusted to compute an aggregate without revealing the values that each person contributes to that aggregate.

Strict limits are placed on the amount of information that each browser instance contributes to the aggregates for a given site. Differential privacy is used to provide additional privacy protection for each contribution.

Details of aggregation service operation is included in § 5 Aggregation. The differential privacy design used is outlined in § 6 Differential Privacy.

1.2. Background

From the early days of the Web, advertising has been widely used to financially support the creation of sites.

One characteristic that distinguished the Web from other venues for advertising was the ability to obtain information about the effectiveness of advertising campaigns.

Web advertisers were able to measure key metrics like reach (how many people saw an ad), frequency (how often each person saw an ad), and conversions (how many people saw the ad then later took the action that the ad was supposed to motivate). In comparison, these measurements were far more timely and accurate than for any other medium.

The cost of measurement performance was privacy. In order to produce accurate and comprehensive information, advertising businesses performed extensive tracking of the activity of all Web users. Each browser was given a tracking identifier, often using cookies that were lodged by cross-site content. Every action of interest was logged against this identifier, forming a comprehensive record of a person’s online activities.

Having a detailed record of a person’s actions allowed advertisers to infer characteristics about people. Those characteristics made it easier to choose the right audience for advertising, greatly improving its effectiveness. This created a strong incentive to gather more information.

Online advertising is intensely competitive. Sites that show advertising seek to obtain the most money for each ad placement. Advertisers seek to place advertising where it will have the most effect relative to its cost. Any competitive edge gained by these entities—​and the intermediaries that operate on their behalf—​depends on having more comprehensive information about a potential audience.

Over time, actions of interest expanded to include nearly every aspects of online activity. Methods were devised to correlate that information with activity outside of the Web. An energetic trade has formed, with multiple purveyors of personal information that is traded for various purposes.

1.3. Goals

The goal of this document is to define a means of performing attribution for advertising that does not enable tracking.

1.4. End-User Benefit

The measurement of advertising performance creates new cross-site flows of information. That information flow creates a privacy risk or cost—​of cross-context recognition—​that needs to be justified in terms of benefits to end users.

Any benefits realized by end users through the use of attribution is indirect.

End users that visit a website pay for "free" content or services primarily through their attention to any advertisements the site shows them. This "value" accrues to the advertiser, who in turn pays the site. The site is expected to use this money to support the provision of their content or services.

+-------------+ +------------+ | | | | | User +----------->| Advertiser | | | Attention | | +-------------+ +-----+------+ ^ | | Content and | Money | Services | | v .----+------. +------------+ | Content | | | | Production | Investment | | Profit, | / |<-----------+ Website +-------> | Service | | | Expenses, | Improvement | | | etc... '-----------' +------------+ User Advertiser Attention Content and Money Services Content Production Investment Profit, / Website Service Expenses, Improvement etc...
Value exchange for advertising-supported content and services

Participation in an attribution measurement system would comprise a secondary cost to Web users.

Support for attribution enables more effective advertising, largely by informing advertisers about what ads perform best, and in what circumstances. Those circumstances might include the time and place that the ad is shown, the person to whom the ad is presented, and the details of the ad itself.

Connecting that information to outcomes allows an advertiser to learn what circumstances most often lead to the outcomes they most value. That allows advertisers to spend more on effective advertising and less on ineffective advertising. This lowers the overall cost of advertising relative to the value obtained. [ONLINE-ADVERTISING]

Sites that provide advertising inventory, such as content publishers and service providers, indirectly benefit from more efficient advertising. Venues for advertising that are better able to show ads that result in the outcomes that advertisers seek can charge more for ad placements.

Sites that obtain support through the placement of advertisements are better able to provide quality content or services. Importantly, that support is derived unevenly from their audience. This can be more equitable than other forms of financial support. Those with a lower tendency or ability to spend on advertised goods obtain the same ad-supported content and services as those who can afford to pay. [EU-AD][COPPACALYPSE]

The ability to supply "free" services supported by advertising has measurable economic benefit that derives from the value of those services. [FREE-GDP]

1.5. Collective Privacy Effect

The use of aggregation—​if properly implemented—​ensures that information provided to sites is about groups and not individuals.

The introduction of this mechanism therefore represents collective decision-making, as described in Privacy Principles § collective-privacy.

Participation in attribution measurement carries a lower privacy cost when the group that participates is larger. This is due to the effect of aggregation on the ability of sites to extract information about individuals from aggregates. This is especially true for central differential privacy, which is the mathematical basis for the privacy design used in this specification.

Larger cohorts of participants also produce more representative—​and therefore more useful—​statistics about the advertising that is being measured.

If attribution is justified, both these factors motivate the enablement of attribution for all users.

Acting to enable attribution measurement by user agents will not be positively received by some people. Different people perceive the costs and benefits that come from engaging with advertising differently. The proposed design allows people the option of appearing to participate in attribution without revealing that choice to sites; see § 3.7.1 Optional Participation.

1.6. Attribution Using Histograms

Attribution attempts to measure correlation between one or more ad placements (impressions) and the outcomes that an advertiser desires.

When considered in the aggregate, information about individuals is not useful. Actions and outcomes need to be grouped.

The simplest form of attribution splits impressions into a number of groupings according to the attributes of the advertisement and counts the number of conversions. Groupings might be formed from attributes such as where the ad is shown, what was shown (the "creative"), when the ad was shown, or to whom.

These groupings and the tallies of conversions attributed to each form a histogram. Each bucket of the histogram counts the conversions for a group of ads.

| +------+ | | example.com +------+----------+ | | news.example +---------+-------+ | | classified.example +-+-------+ | | search.example +-+ | example.com news.example classified.example search.example
Sample histogram for conversion counts, grouped by the site where the impressions were shown

Different groupings might be used for different purposes. For instance, grouping by creative (the content of an ad) might be used to learn which creative works best.

Adding a value greater than one at each conversion enables more than simple counts. Histograms can also aggregate values, which might be used to differentiate between different outcomes. The value that is allocated to impressions is called a conversion value. A higher conversion value might be used for larger purchases or any outcome that is more highly-valued. A conversion value might also be split between multiple impressions to split credit, though this capability is not presently supported in the API.

2. Overview of Operation

The private attribution API provides aggregate information about the association between two classes of events: impressions and conversions.

An impression is any action that an advertiser takes on any website. The API does not constrain what can be recorded as an impression. Typical actions that an advertiser might seek to measure include:

For the API, a conversion is an outcome that is being measured. The API does not constrain what might be considered to be an outcome. Typical outcomes that advertisers might seek to measure include:

The remainder of this section describes how the Private Attribution API operates in conjunction with an aggregation service to produce an aggregate attribution measurement. That operation is illustrated in the following figure.

Conversion +--------------+ Reports +-------------+ | +===========>| | | Advertiser | | Aggregation | | Server | | Service | | |<-----------+ | +--------------+ Histogram +-------------+ ^ ^ ^ Conversion | | | Reports | | '-----------. | '-----------. | | | | +-------------+-+-+ +---------+----+ | | | | | | | | Other Users | Publisher | | | | Advertiser | | Site(s) | | | | Site | | | | | | | +------+------+-+-+ +-----+--------+ | | ^ saveImpression | measureConversion | | Conversion | | | Report v v | +-------------------------------------+---------+ | | | Private Attribution APIs | | | +-----------------------------------------------+ ^ | v .-----------. | | | Impression | | Store | | | '-----------' Conversion Reports Advertiser Aggregation Server Service Histogram Conversion Reports Other Users Publisher Advertiser Site(s) Site saveImpression measureConversion Conversion Report Private Attribution APIs Impression Store
Overview of Private Attribution Operation

When an impression occurs, the saveImpression() method can be used to request that the browser save information. This includes an identifier for the impression and some additional information about the impression. For instance, advertisers might use additional information to record whether the impression was an ad view or an ad click.

At conversion time, a conversion report is created. A conversion report is an encrypted histogram contribution that includes information from any impressions that the browser previously stored.

The measureConversion() method accepts a simple query that is used to tell the browser how to construct a conversion report. That includes a simple query that selects from the impressions that the browser has stored, a conversion value that is allocated to the selected impression(s), and other information needed to construct the conversion report.

The histogram created by the conversion report is constructed as follows:

The browser updates the privacy budget store to reflect the reported conversion.

The resulting histogram is prepared for aggregation according to the requirements of the chosen aggregation service and returned to the site. This minimally involves encryption of the histogram.

A site that invokes this API will always receive a valid conversion report. As a result, sites learn nothing about what happened on other sites from this interaction.

The site can collect the encrypted histograms it receives from calls to this API and submit them to the aggregation service.

Upon receiving a set of encrypted histograms from a site, the aggregation service:

  1. confirms that it has not previously computed an aggregate from the provided inputs and that there are enough conversion reports,

  2. adds the histograms including sufficient noise to produce a differentially-private aggregate histogram, and

  3. returns the aggregate to the site.

3. API Details

A site using the Private Attribution API will typically register either impressions or conversions, but in some cases the same site may do both.

To register an impression, a site calls saveImpression(). No preparation is required to use this API beyond collecting parameter values, although it may be useful to examine the supported aggregationServices in deciding whether to use the Private Attribution API.

To request a conversion report, a site calls measureConversion(). Before calling this API, a site must select a supported aggregation service. The page may select any of the supported services found in aggregationServices. The name of the selected service must be supplied as the aggregator member of the PrivateAttributionConversionOptions dictionary when calling the measureConversion() method.

This section needs to be more precise about site vs. origin.

3.1. Finding a Supported Aggregation Service

Is any additional information required in the PrivateAttributionAggregationService dictionary? Do we want to rename apiVersion to protocol? And we should definitely define an enum for it.

The aggregationServices attribute contains a list of aggregation services supported by the user agent. The page must select and specify one of these services when calling the measureConversion() method. It may also be useful to query the supported services before registering an impression, but that is not required, and impressions are not scoped to a single aggregation service.

dictionary PrivateAttributionAggregationService {
  required DOMString name;
  required DOMString apiVersion;
};

[SecureContext, Exposed=Window]
interface PrivateAttribution {
  attribute FrozenArray<PrivateAttributionAggregationService> aggregationServices;
};

The aggregationServices attribute contains the following information about each supported aggregation service:

name, of type DOMString
Name of the aggregation service. This is passed as the aggregator parameter to measureConversion().
apiVersion, of type DOMString
Version of the Private Attribution API supported by this aggregator. Even if an aggregator supports multiple versions of the API, it is expected to assign a unique aggregation service name for each supported version. Thus, the API version is implicit in the aggregator selection and does not need to be passed to measureConversion().

3.2. Saving Impressions

The saveImpression() method requests that the user agent record an impression in the impression store.

navigator.privateAttribution.saveImpression({
  histogramIndex: 3,
  filterData: 2,
  conversionSite: "advertiser.example",
  lifetimeDays: 7,
});
dictionary PrivateAttributionImpressionOptions {
  required unsigned long histogramIndex;
  required unsigned long filterData;
  required DOMString conversionSite;
  unsigned long lifetimeDays;
};

[SecureContext, Exposed=Window]
partial interface PrivateAttribution {
  [Throws] undefined saveImpression(PrivateAttributionImpressionOptions options);
};

The arguments to saveImpression() are as follows:

histogramIndex, of type unsigned long
If measureConversion() matches this impression with a subsequent conversion, the conversion value will be added to the histogram bucket identified by this index.
filterData, of type unsigned long
An optional piece of metadata associated with the impression. The filterData can be used to identify which impressions may receive attribution from a conversion.
conversionSite, of type DOMString
The site where conversions for this impression may occur, identified by its domain name. The measureConversion() method will only attribute to this impression when called by the indicated site.
lifetimeDays, of type unsigned long
A "time to live" (in days) after which the impression can no longer receive attribution. The user agent should impose an upper limit on the lifetime, and silently reduce the value specified here if it exceeds that limit.

3.2.1. Operation

  1. Collect the implicit API inputs:

    1. The current timestamp

    2. The impression site domain

    3. The iframe site domain

  2. Validate the page-supplied API inputs

  3. If the private attribution API is enabled, save the impression to the impression store.

saveImpression() does not return a status indicating whether the impression was recorded. This minimizes the ability to detect when the Private Attribution API is [[#opt-out|disabled].

3.3. Requesting Attribution for a Conversion

The measureConversion() method requests that the user agent perform attribution for a conversion, and return a conversion report.

The measureConversion() method always returns a conversion report, regardless of whether matching impression(s) are found. If there is no match, or if differential privacy disallows reporting the attribution, the returned conversion report will not contribute to the histogram, i.e., will be uniformly zero.

navigator.privateAttribution.measureConversion({
  // name of the aggregation service
  aggregator: "aggregator.example",

  // the number of buckets in the histogram
  histogramSize: 20,
  // the amount of privacy budget to use
  epsilon: 1,

  // the attribution logic to use
  logic: "last-touch",
  // the value to assign to the histogram index of the impression
  value: 3,
  // the maximum value which can be generated across all reports included in the aggregation
  // used together with epsilon to calibrate the differential privacy budget to use
  maxValue: 5,

  // only consider impressions within the last N days
  lookbackDays: 30,
  // an optional filter to restrict the set of ads that can be attributed
  filterData: 2,
  // an optional list of sites where impressions might have been registered
  impressionSites: ["publisher.example"],
  // an optional list of sites which called the saveImpression API
  intermediarySites: ["ad-tech.example"],
});
dictionary PrivateAttributionConversionOptions {
  required DOMString aggregator;
  double epsilon = 1.0;

  required unsigned long histogramSize;

  PrivateAttributionLogic logic = "last-touch";
  unsigned long value = 1;
  unsigned long maxValue = 1;

  unsigned long lookbackDays;
  unsigned long filterData;
  sequence<DOMString> impressionSites = [];
  sequence<DOMString> intermediarySites = [];
};

[SecureContext, Exposed=Window]
partial interface PrivateAttribution {
  [Throws] Promise<Uint8Array> measureConversion(PrivateAttributionConversionOptions options);
};

The arguments to measureConversion() are as follows:

aggregator, of type DOMString
A selection from the aggregation services that can be found in aggregationServices.
epsilon, of type double, defaulting to 1.0
The amount of privacy budget to expend on this conversion report.
histogramSize, of type unsigned long
The number of histogram buckets to use in the conversion report.
logic, of type PrivateAttributionLogic, defaulting to "last-touch"
A selection from PrivateAttributionLogic indicating the attribution logic to use.
value, of type unsigned long, defaulting to 1
The conversion value. If an attribution is made and privacy restrictions are satisfied, this value will be encoded into the conversion report.
maxValue, of type unsigned long, defaulting to 1
The maximum conversion value across all contributions included in the aggregation. Together with epsilon, this is used to calibrate the distribution of random noise that will be added to the outcome. It is also used to determine the amount of privacy budget to expend on this conversion report.
lookbackDays, of type unsigned long
An integer number of days. Only impressions occurring within the past lookbackDays may match this conversion.
filterData, of type unsigned long
Only impressions having a filterData value matching this value will be eligible to match this conversion.
impressionSites, of type sequence<DOMString>, defaulting to []
A list of impression sites. Only impressions recorded where the top-level site is on this list are eligible to match this conversion.
intermediarySites, of type sequence<DOMString>, defaulting to []
A list of sites which called the saveImpression() API. Only impressions recorded by scripts originating from one of the intermediary sites are eligible to match this conversion.

3.3.1. Operation

  1. Collect the implicit API inputs

    1. The current timestamp

    2. The conversion site domain

    3. The iframe site domain

  2. Validate the page-supplied API inputs

    1. If logic is specified, and the value is anything other than "last-touch", return an error.

  3. If the private attribution API is enabled, invoke the routine to fill a histogram using last-touch attribution.

  4. Encrypt the report.

  5. Return the encrypted report.

3.4. Impression Store

The impression store is used by the measureConversion() method to find matching impressions.

3.4.1. Contents

The impression store must store the following information:

Filter Data The filterData value passed to saveImpression().
Impression Site The site that called saveImpression().
Intermediary Site The site corresponding to the script that called saveImpression().
Conversion Sites The conversion site(s) that were passed to saveImpression().
Timestamp The time at which saveImpression() was called.
Lifetime The number of days an impression remains eligible for attribution, either from the call to saveImpression(), or a user agent-defined limit.
Histogram Index The histogram index passed to saveImpression().

3.4.2. Maintenance

The user agent should periodically use the timestamp and lifetime values to identify and delete any impressions in the impression store that have expired.

It is not necessary to remove impressions immediately upon expiry, as long as measureConversion() excludes expired impressions from attribution. However, the user agent should not retain expired impressions indefinitely.

3.4.3. Clearing

A mechanism must be provided to clear the impression store. For example, the impression store could be cleared upon activation of the control that disables the Private Attribution API. It is recommended that any mechanism a user agent provides to clear stored browsing data (history, cookies, etc.) be extended to cover the impression store.

3.5. Privacy Budget Store

The privacy budget store records the state of the per-site privacy budgets, and of any safety limits. It is updated by deduct privacy budget.

The privacy budget store needs to be described in more detail. Some references to clearing the impression store may need to be updated to refer to the privacy budget store as well.

3.6. Attribution Logic

A site that measures conversions can specify attribution logic, which determines how the conversion value is allocated to histogram buckets. The measureConversion() function accepts a logic parameter that specifies the attribution logic.

enum PrivateAttributionLogic {
  "last-touch",
};

Each attribution logic specifies a process for allocating values to histogram buckets. This logic includes how to select impressions, how to handle weeks in which the privacy budget is insufficient, and (optionally) how to process any additional parameters that might be used.

3.6.1. Last Touch Attribution

The "last-touch" attribution logic indicates that the browser should select the last (most recent) impression that matches the common matching logic. The entire conversion value (up to the maximum imposed by the privacy budget) is allocated to the histogram bucket that was saved with the impression.

Last touch attribution does not select any impression that was saved during a week that does not have sufficient privacy budget. If impressions match from a week that does not have enough privacy budget, impressions are not matched for any preceding weeks. That is, once a week has a matching impression and insufficient budget, the process will set a value of zero for all histogram buckets.

To fill a histogram using last-touch attribution, given options:

  1. Initialize impression to a null value.

  2. Initialize value to options.value.

  3. Let now be the current time.

  4. For each week starting from the current week to the oldest week supported by the user agent:

    1. Let impressions be the result of invoking common matching logic with options, week, and now.

    2. If impressions is not empty:

      1. Retain the value of week.

      2. Set impression to the value in impressions with the most recent impression.timestamp.

      3. Exit the loop.

  5. If impression is null, let budgetOk be false.

  6. Otherwise, let budgetOk be the result of deduct privacy budget with week and options.epsilon.

  7. If budgetOk is false, set value to 0.

  8. If impression.histogramIndex is options.histogramSize or greater, set value to 0.

  9. If value is not 0, set index to impression.histogramIndex.

  10. Otherwise, set index to 0.

  11. Return a histogram containing options.histogramSize values, with a value of value at an index of index and a value of zero at all other indices.

3.6.2. Common Impression Matching Logic

TODO specify how to match using "lookbackDays", "filterData" and "impressionSites".

Discuss "infinite" lookbackDays. Clarify when it apples. When field is missing? Zero?

To perform common matching logic, given options, week, and moment now:

  1. If number of days since the end of week exceeds lookbackDays, return an empty set.

  2. Initialize matching to an empty set.

  3. For each impression in the saved impressions for the week:

    1. If now - lookbackDays is after impression.timestamp, continue the loop.

    2. If options.filterData does not match impression.filterData, continue the loop.

    3. If options.impressionSites does not contain impression.impressionSite, continue the loop.

    4. Add impression to matching.

  4. Return matching.

3.7. User Control and Visibility

3.7.1. Optional Participation

Text fragment moved from privacy section:

This mechanism may be a dedicated control for the Private Attribution API, or it may be a consolidated privacy control that applies to multiple features, including private attribution. Further, user agent developers should consider interaction of other privacy modes with the Private Attribution API. For example, attribution might be disabled in a private browsing mode, or it might be disabled if the user has opted out of collection of diagnostic data.

3.7.2. Visibility

4. Implementation Considerations

5. Aggregation

An aggregation service takes multiple pieces of attribution information and produces an aggregate metric.

Each browser will have different requirements for aggregation.

5.1. Multi-Party Computation Aggregation

TODO

5.2. Trusted Execution Environments

TODO

5.3. Conversion Report Encryption

TODO

5.4. Anti-Replay Requirements

Conversion reports generated by browsers are bound to the amount of privacy budget that was expended by the site that requested the report.

TODO

6. Differential Privacy

This design uses the concept of differential privacy as the basis of its privacy design. [PPA-DP]

Differential privacy is a mathematical definition of privacy that can guarantee the amount of private information that is revealed by a system. [DP] Differential privacy is not the only means by which privacy is protected in this system, but it is the most rigorously defined and analyzed. As such, it provides the strongest privacy guarantees.

Differential privacy uses randomized noise to hide private data contributions to an aggregated dataset. The effect of noise is to hide individual contributions to the dataset, but to retain the usefulness of any aggregated analysis.

To apply differential privacy, it is necessary to define what information is protected. In this system, the protected information is the impressions of a single user profile, on a single user agent, over a single week, for a single website that registers conversions. § 6.1 Privacy Unit describes the implications of this design in more detail.

This attribution design uses a form of differential privacy called individual differential privacy. In this model, user agents are each separately responsible for ensuring that they limit the information that is contributed.

The individual differential privacy design of this API has three primary components:

  1. User agents limit (using the privacy budget) the amount of information about impressions that leaves the device through conversion reports. § 6.2 Privacy Budgets explores this in greater depth.

  2. Aggregation services ensure that any given conversion report is only used in accordance with the privacy budget that was accounted for it by the user agent. § 5.4 Anti-Replay Requirements describes requirements on aggregation services in more detail.

  3. Noise is added by aggregation services. § 6.3 Differential Privacy Mechanisms details the mechanisms that might be used.

Together, these measures place limits on the information that is released for each privacy unit.

6.1. Privacy Unit

An implementation of differential privacy requires a clear definition for what is protected. This is known as the privacy unit, which represents the entity that receives privacy protection.

This system adopts a privacy unit that is the combination of three values:

  1. A user agent profile. That is, an instance of a user agent, as used by a single person.

  2. The site that requests information about impressions.

    The sites that register impressions are not considered. Those sites do not receive information from this system directly.

  3. The current week.

A change to any of these values produces a new privacy unit, which results in a separate privacy budget. Each site that a person visits receives a bounded amount of information for each week.

Ideally, the privacy unit is a single person. Though ideal, it is not possible to develop a useful system that guarantees perfect correspondance with a person, for a number of reasons:

6.1.1. Browser Instances

Each browser instance manages a separate privacy budget.

Coordination between browser instances might be possible, but not expected. That coordination might allow privacy to be improved by reducing the total amount of information that is released. It might also improve the utility of attribution by allowing impressions on one browser instance to be converted on another.

Coordination across different implementations is presently out of scope for this work. Implementations can perform some coordination between instances that are known to be for the same person, but this is not mandatory.

6.1.2. Per-Site Limits

The information released to websites is done on the basis of site. This aligns with the same boundary used in other privacy-relevant functions.

A finer privacy unit, such as an origin, would make it trivial to obtain additional information. Information about the same person could be gathered from multiple origins. That information could then be combined by exploiting the free flow of information within the site, using cookies [COOKIES] or similar.

§ 6.2.2 Safety Limits discusses attacks that exploit this limit and some additional safety limits that might be implemented by user agents to protect against those attacks.

6.1.3. Privacy Budget Epochs

Sites receive a separate differential privacy budget for the data in every time internval called and epoch. The epoch length is one week.

This budget applies to the impressions that are registered with the user agent and later queried, not conversions.

From the perspective of the analysis [PPA-DP] each week of impressions forms a separate database. A finite privacy budget is enforced across all the queries made on each database.

Having a conversion report produced from impressions that span multiple weeks has privacy consequences. A single visit to a website can give that site information about activities across many weeks. This only requires that the conversion site is identified as the destination for impressions over that entire period. The number of weeks that can be queried are limited by user agents.

The goal is to set an epoch that is as large as feasible. A longer period of time allows for a better privacy/utility balance because sites can be allocated a larger overall budget at any point in time, while keeping the overall rate of privacy loss low. However, a longer interval means that it is easier to exhaust a privacy budget completely, yield no information until the next refresh.

The choice of a week is largely arbitrary. One week is expected to be enough to allow sites the ability to make decisions about how to spend privacy budgets without careful planning that needs to account for changes that might occur days or weeks in the future.

§ 6.2 Privacy Budgets describes the process for budgeting in more detail.

6.2. Privacy Budgets

Browsers maintain a privacy budget, which is a means of limiting the amount of privacy loss.

This specification uses an individual form of (ε, δ)-differential privacy as its basis. In this model, privacy loss is measured using the value ε. The δ value is handled by the aggregation service when adding noise to aggregates.

Each user agent instance is responsible for managing privacy budgets.

Each conversion report that is requested specifies an ε value that represents the amount of privacy budget that the report consumes and a max on the value that can be returned in the conversion report.

6.2.1. Privacy Budget Deduction

When searching for impressions for the conversion report, the user agent deducts the specified ε value from the budget for the week in which those impressions were saved. If the privacy budget for that week is not sufficient, the impressions from that week are not used.

The details of how to deduct privacy budget is given below ... WIP

In the following figure, impressions are recorded from a number of different sites, shown with circles.
| | | | | ║ Site A | o | | | o| ║ Site B | | o |* * | o | ║ Site C | o * | |o | | ║ Site D | | | | | ║ Site E | | | |* | * ║ | | | | | ║ ----+---------+---------+---------+---------+---+--> time `--. .--' `--. .--' `--. .--' `--. .--' ^ | | | | | week 1 week 2 week 3 week 4 now Site A Site B Site C Site D Site E time week 1 week 2 week 3 week 4 now
An example of a store of impressions over time

A conversion report might be requested at the time marked with "now". That conversion report selects impressions marked with black circles, corresponding to impressions from Site B, C, and E.

As a result, privacy budgets for the querying site is deducted from weeks 1, 3, 4, and 5. No impressions were recorded for week 2, so no budget is deducted from that week.

How a user agent manages exhaustion of a privacy budget depends on the attribution logic that was specified.

6.2.2. Safety Limits

The basic privacy unit is vulnerable to attack by an adversary that is able to correlate activity for the same person across multiple sites.

Groups of sites can sometimes coordinate their activity, such as when they have shared ownership or strong agreements. A group of sites that can be sure that particular visitor is the same person—​using any means, including something like FedCM [FEDCM]—​can combine information gained from this API.

This can be used to increase the rate at which a site gains information from attribution, proportional to the number of sites across which coordination occurs. The default privacy unit places no limit on the information released in this way.

To counteract this effect, user agents can implement safety limits, which are additional privacy budgets that do not consider site. Safety limits might be significantly higher than per-site budgets, so that they are not reached for most normal browsing activity. The goal would be to ensure that they are only effective for intensive activity or when being attacked.

Like the per-site privacy budget, it is critical that sites be unable to determine whether their request for a conversion report has caused a safety limit to be exceeded.

6.3. Differential Privacy Mechanisms

The specific mechanisms that are used depend on the type of aggregation service.

7. Security Considerations

7.1. Impression Store

The impression store used by the Private Attribution API holds information related to browsing activity and persists across browsing sessions. Although the flow of information through the impression store is strictly controlled, it carries some amount of information across origins.

The following measures limit the possibility of harmful information flow through the impression store:

7.2. API Implementation

The Private Attribution APIs must be implemented carefully to maintain the required security and privacy properties. A site calling the APIs must not be able to learn:

Note that explicit return values or thrown exceptions are not the only way that a site can learn from the Private Attribution APIs. It may be possible to infer sensitive information from side channels like:

While complete elimination of all side channels is impractical, implementations must make reasonable efforts to prevent leakage of sensitive information from the attribution APIs. Strategies to prevent leakage include:

7.3. Aggregation Services

Although not part of the web platform, security of aggregation services is quite important to the overall security of the Private Attribution mechanism. Conversion reports produced by measureConversion() are encrypted to cryptographic key(s) of the aggregation service. Thus, much of the potential for disclosure of the information contained in these reports depends on the details of the aggregation service.

User agent developers should carefully consider the design of an aggregation service and the trustworthiness of the aggregation service operator before adding it as a supported service for the Private Attribution API. Additional discussion of these issues may be found in § 5 Aggregation and § 8 Privacy Considerations.

7.4. Combining Reports from Multiple Sites

The privacy mechanisms in the Private Attribution API operate primarily at the granularity of sites. A malicious operator may attempt to register impressions for multiple sites, thus exceeding the amount of information that would otherwise be released through private attribution. § 6.2.2 Safety Limits discusses establishing additional cross-site privacy budgets to mitigate this possibility.

Rate limits on calls to the Private Attribution APIs could also be an effective mechanism to prevent harvesting information through overuse of the APIs.

7.5. Ad Fraud

As with many technologies, advertising on the web has been the subject of various kinds of fraud.

Fraudulent registration of impressions is a particular concern with the Private Attribution API, because impressions are stored only on the device. It is not possible to apply server-side intelligence to identify fraudulent impressions and exclude them from attribution. Conversely, even though conversion reports are encrypted, because the reports are sent to a server, the server can make a determination that the conversion is likely fraudulent and exclude it from aggregation.

An important mitigation against malicious use of the Private Attribution APIs is the explicit specification of eligible conversion sites when registering an impression, and of eligible impression sites and ad IDs when registering a conversion. This prevents impressions on arbitrary malicious sites from interfering with attribution to the intended set of candidate impressions.

8. Privacy Considerations

8.1. Information Exposed by the Private Attribution API

The impression store and privacy budget store contain information about a cross-section of browsing activity. As use of the API increases, so does the scope of this information. However, most of the information written to these stores is never disclosed. Because attribution is performed on the device (on-device attribution), only information about attributed conversions is exposed by the Private Attribution API. This contrasts with other schemes in which information about both impressions and conversions is sent to the aggregation service for off-device attribution. In the latter class of schemes, the amount of information that could be revealed in a compromise of the aggregation service (or in a compromise of communication with the aggregation service) is significantly larger.

When the Private Attribution API makes an attribution, information about that attribution is released from the device only to the extent the differential privacy restrictions allow.

While the Private Attribution API is intended to measure the association of relatively infrequent conversion events with a limited set of related impression candidates, it is important to consider how the API might be misused for larger-scale data collection. The requirement that impressions enumerate the possible conversion sites (and vice-versa) has an important role in preventing misuse of the API for mass data collection, and in making attempts at such misuse more visible.

It is unclear whether the privacy budget store should be cleared whenever the impression store is cleared. On one hand, it contains information about browsing activity, so is desirable to include it when clearing browsing activity. On the other hand, it is only possible to strictly adhere to the requirements of the differential privacy mechanism, if information about a fully- or partially- depleted privacy budget is maintained until that budget is no longer relevant (i.e. the end of the week).

8.2. Disabling the Private Attribution API

The Private Attribution API is designed to reveal only aggregate information. The use of differential privacy limits the chance of determining whether any particular user contributed to the aggregated output. However, some users may still prefer not to participate in attribution measurement. As discussed in § 3.7.1 Optional Participation, the user agent must provide a mechanism for the user to disable the Private Attribution API.

To minimize the risk of fingerprinting, and to prevent discrimination against users who choose to disable the Private Attribution API, sites must not be able to detect that the API is disabled. Specifically, all calls to the Private Attribution API that are otherwise valid, must complete successfully, even when the API is disabled. The only difference in behavior is that conversion reports returned when the API is disabled will never report any conversion value. Because the reports are encrypted, this difference cannot be detected by the site receiving the conversion report.

8.3. Use in Third-party Contexts

The Private Attribution API is available even in third-party contexts. In particular, a third-party iframe may call saveImpression(). Note, however, that the impression is recorded with the site of the top-level navigation context, not the origin of the iframe.

While the availability of the API in third-party contexts carries some increase in privacy risk, this support is deemed necessary because iframes are commonly used to display advertisements.

9. Acknowledgements

This specification is the result of a lot of work from many people. The broad shape of this level of the API is based on an idea from Luke Winstrom. The privacy architecture is courtesy of the authors of [PPA-DP].

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HR-TIME-3]
Yoav Weiss. High Resolution Time. URL: https://w3c.github.io/hr-time/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[PRIVACY-PRINCIPLES]
Robin Berjon; Jeffrey Yasskin. Privacy Principles. URL: https://w3ctag.github.io/privacy-principles/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[COOKIES]
A. Barth. HTTP State Management Mechanism. April 2011. Proposed Standard. URL: https://httpwg.org/specs/rfc6265.html
[COPPACALYPSE]
Garrett Johnson; et al. COPPAcalypse? The Youtube Settlement's Impact on Kids Content. 2024-03-14. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4430334
[DP]
Cynthia Dwork; Aaron Roth. The Algorithmic Foundations of Differential Privacy. 2014. URL: https://doi.org/10.1561/0400000042
[EU-AD]
Niklas FOURBERG; et al. Online advertising: the impact of targeted advertising on advertisers, market access and consumer choice. 2021-06. URL: https://www.europarl.europa.eu/thinktank/en/document/IPOL_STU(2021)662913
[FEDCM]
Nicolas Pena Moreno. Federated Credential Management API. URL: https://w3c-fedid.github.io/FedCM/
[FREE-GDP]
Leonard Nakamura; Jon D. Samuels; Rachel Soloveichik. Measuring the "Free" Digital Economy within the GDP and Productivity Accounts. 2017-10. URL: https://www.bea.gov/research/papers/2017/measuring-free-digital-economy-within-gdp-and-productivity-accounts
[ONLINE-ADVERTISING]
Avi Goldfarb; Catherine Tucker. Online Advertising. URL: http://www-2.rotman.utoronto.ca/~agoldfarb/OnlineAdvertising.pdf
[PPA-DP]
Pierre Tholoniat; et al. Cookie Monster: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems. URL: https://arxiv.org/abs/2405.16719
[UNSANCTIONED-TRACKING]
Mark Nottingham. Unsanctioned Web Tracking. 17 July 2015. TAG Finding. URL: http://www.w3.org/2001/tag/doc/unsanctioned-tracking/

IDL Index

dictionary PrivateAttributionAggregationService {
  required DOMString name;
  required DOMString apiVersion;
};

[SecureContext, Exposed=Window]
interface PrivateAttribution {
  attribute FrozenArray<PrivateAttributionAggregationService> aggregationServices;
};

dictionary PrivateAttributionImpressionOptions {
  required unsigned long histogramIndex;
  required unsigned long filterData;
  required DOMString conversionSite;
  unsigned long lifetimeDays;
};

[SecureContext, Exposed=Window]
partial interface PrivateAttribution {
  [Throws] undefined saveImpression(PrivateAttributionImpressionOptions options);
};

dictionary PrivateAttributionConversionOptions {
  required DOMString aggregator;
  double epsilon = 1.0;

  required unsigned long histogramSize;

  PrivateAttributionLogic logic = "last-touch";
  unsigned long value = 1;
  unsigned long maxValue = 1;

  unsigned long lookbackDays;
  unsigned long filterData;
  sequence<DOMString> impressionSites = [];
  sequence<DOMString> intermediarySites = [];
};

[SecureContext, Exposed=Window]
partial interface PrivateAttribution {
  [Throws] Promise<Uint8Array> measureConversion(PrivateAttributionConversionOptions options);
};

enum PrivateAttributionLogic {
  "last-touch",
};

Issues Index

This section needs to be more precise about site vs. origin.
Is any additional information required in the PrivateAttributionAggregationService dictionary? Do we want to rename apiVersion to protocol? And we should definitely define an enum for it.
The privacy budget store needs to be described in more detail. Some references to clearing the impression store may need to be updated to refer to the privacy budget store as well.
Rate limits on calls to the Private Attribution APIs could also be an effective mechanism to prevent harvesting information through overuse of the APIs.
It is unclear whether the privacy budget store should be cleared whenever the impression store is cleared. On one hand, it contains information about browsing activity, so is desirable to include it when clearing browsing activity. On the other hand, it is only possible to strictly adhere to the requirements of the differential privacy mechanism, if information about a fully- or partially- depleted privacy budget is maintained until that budget is no longer relevant (i.e. the end of the week).