Archive

Archive for the ‘Tech’ Category

Beaconing In Practice: fetchLater()

July 25th, 2024

Table of Contents

  1. Introduction
  2. fetchLater API
  3. Why Deferred Fetches
  4. Evolution from Pending Beacon
  5. What I Got Wrong Last Time
  6. fetchLater Experiments
    1. Methodology
    2. Reliability of XMLHttpRequest vs. sendBeacon() vs. fetchLater Beacon in Event Handlers
      1. onload
      2. pagehide or visibilitychange
      3. onload or pagehide or visibilitychange
      4. Conclusion
    3. Reliability of fetchLater() using activateAfter
  7. Follow-Ups
  8. How We’re Going to Use it
  9. TL;DR

Introduction

This is a follow-up to the post Beaconing in Practice: An Update on Reliability and the Pending Beacon API, which itself is a follow-up to an article titled Beaconing In Practice. These articles cover all aspects of sending telemetry from your web app to a back-end server for analysis (aka "beaconing").

In the past year, the Pending Beacon API has evolved like a Pokémon and is now called the fetchLater() API. I think the new API shape is more ergonomic, more reliable, and a good step forward.

In this article, I will review the updated API and see how it stacks up to its predecessor, the Pending Beacon API, as well as the standard way of beaconing on the web via XMLHttpRequest (XHR) and sendBeacon(). Some of the content of this article will look similar to the last one, with some additional content for how the API has evolved, and newer findings from experimentation.

A summary of where we left off last time:

  • The Pending Beacon API was showing great promise, giving developers better ergonomics for sending data, and a more reliable way to send beacons at the end of the page lifetime
  • There were a few scenarios that Pending Beacon seemed less reliable than using sendBeacon():
    • During onload Pending Beacon (with timeout:0) was about 1.2% less reliable than sendBeacon()
    • During pagehide and visibilitychange Pending Beacon (with timeout:0), on Mobile, was about 17.9% less reliable than sendBeacon()
    • After reviewing my methodology with Chrome engineers, they pointed out I had forgotten to use .sendNow() in scenarios that it should be used; details on that suggestion below.
  • Pending Beacon requests were hard to debug due to the beacons not showing up in Chrome Developer Tools.
    • Since the API is now fetch()-based, this has been resolved and they now show up. Great!

fetchLater() API

The fetchLater() API is an evolution of the Pending Beacon API (based on feedback from the community and the other browser vendors), and it aims to allow developers to send a "deferred" fetch().

Why would you want to defer your fetches? A primary use-case is for beaconing data from a web app for analysis/analytics purposes. Deferred fetches can be useful when exfiltrating telemetry, i.e. when that beacon contains a payload that is not required for building the webpage or presenting anything to the visitor.

The goal of fetchLater() is to provide an API to developers where they can "queue" data to be sent at a later date — either after a timeout, or, at the point the page is about to be unloaded.

This helps developers avoid having to explicitly send beacons themselves in events like pagehide or visibilitychange (which don’t always fire reliably).

The API looks similar to a regular fetch(), which developers should be familiar with.

Here’s an example of using the fetchLater() API to send a beacon when the page is being unloaded (or a maximum of 60 seconds after "now"):

// queue a beacon for the unloading or +60s
fetchLater(beaconUrl, {
  activateAfter: 60000
});

The API is still being discussed, and is actively evolving based on community and browser vendor feedback.

If you want to experiment with fetchLater() in Chrome today, you can register for an Origin Trial for Chrome 121-126.

Why Deferred Fetches?

One of the challenges highlighted in the Beaconing In Practice article is how to reliably send data once it’s been gathered in a web app.

Developers frequently use events such as beforeunload/unload or pagehide/visibilitychange as a trigger for beaconing their data, but these events are not reliably fired on all platforms. If the events don’t fire, the beacons don’t get sent.

For example, if you want to gather all of your data and only send it once as the page is unloading, registering for all 4 of those events will only give you ~82.9% reliability in ensuring the data arrives at your server, even when using the sendBeacon() API.

So, wouldn’t it be lovely if developers had a more reliable way of "queuing" data to be sent, and have the browser automagically send it once the page starts to unload? That’s where fetchLater() comes in.

The fetchLater() API gives developers a way to build a "deferred" beacon. That deferred beacon will then be sent at the timeout, or, as the page is unloading. It can also be aborted before then, if desired. As a result, developers no longer need to listen to the beforeunload/unload/pagehide/visibilitychange events to send data.

Ideally, fetchLater() will be a mechanism that can replace usage of sendBeacon() in browsers that support it, giving more reliable delivery of beacon data and better developer ergonomics (by not having to listen for, and send data during, unload-ish events).

Evolution from Pending Beacon

fetchLater() evolved from the Pending Beacon API, based on feedback from other browser vendors and the web performance community.

Pending Beacon was a brand new API that allowed you to configure a few timeouts, send/update the payload, and force the beacon out immediately:

var pb = new window.PendingGetBeacon(beaconUrl, {
    timeout: 0,
    backgroundTimeout: -1
});
pb.setData(1);
pb.sendNow();
// or
pb.deactivate();

Rather than creating an entirely new PendingGetBeacon() interface, fetchLater() is merely a mirror of fetch() with one additional optional parameter (activateAfter). The deferred fetch can still be aborted (via an AbortController) like a normal fetch().

fetchLater(beaconUrl, {
    activateAfter: 0
});

// can't be updated, but you can use an AbortController to create a new one
// no need for .sendNow()
// can be deactivated with an AbortController

One other difference with PendingBeacon was that it had a backgroundTimeout option, which would send a beacon after the specified number of milliseconds when the page entered the next hidden visibility state (or was abandoned):

var pb = new window.PendingGetBeacon(beaconUrl, {
    backgroundTimeout: 1000
});

This behavior is not available in fetchLater(), though you could replicate it manually:

fetchLater(beaconUrl);

document.addEventListener("visibilitychange", () => {
  if (document.hidden) {
    setTimeout(function() {
      if (document.hidden) {
        fetchLater(beaconUrl);
      }
    }, 1000);
  }
});

This feels more straightforward to use, and avoids one of the traps I fell into when experimenting with Pending Beacon last time (see next section).

What I Got Wrong Last Time

When I was experimenting with Pending Beacon last year, there were two big issues I found with regards to reliability:

  • During onload Pending Beacon (with timeout:0) was about 1.2% less reliable than sendBeacon()
  • During pagehide and visibilitychange Pending Beacon (with timeout:0), on Mobile, was about 17.9% less reliable than sendBeacon()

Both of these scenarios utilized Pending Beacon with a { timeout: 0 } option, meaning I was asking the browser to send the beacon right away.

Here’s example code for what it looked like:

new window.PendingGetBeacon(beaconUrl, {
    timeout: 0,
    backgroundTimeout: -1
});

What I missed, however, was that the Pending Beacon interface had a method .sendNow() that would tell the browser to actually send it immediately.

Here’s what I should have done:

let b = new window.PendingGetBeacon(beaconUrl, {
    timeout: 0,
    backgroundTimeout: -1
});
b.sendNow(); // <-- forgot to do this last time

In talking with the Chrome engineers, we think that excluding the .sendNow() may have caused the drop in reliability — timeout: 0 alone wasn’t enough to force the beacon to send right away.

This was especially important in the page-is-unloading scenario (in pagehide and visibilitychange listeners) as not forcing with .sendNow() meant the browser didn’t prioritize sending the payload prior to exiting the page/app.

fetchLater() Experiments

Given those goals, I was curious to see how reliable fetchLater() would be compared to existing APIs like XMLHttpRequest (XHRs) or the sendBeacon() API. I performed several experiments comparing how reliably data arrived after using one of those APIs in different scenarios.

Let’s explore these questions:

  1. Can we swap fetchLater() in for usage of XHR and/or sendBeacon() in unload event handlers?
  2. How reliable is using only fetchLater()‘s activateAfter, rather than listening to event handlers?

Where possible, I will also mention how fetchLater() compares with the previous API shape (Pending Beacon).

Methodology

Over the course of 3 months, on a site that I control (with approx 2.5 million samples), I ran an experiment gathering data from browsers using the following three APIs:

An A/B/C experiment was run distributing the test across those APIs, which all sent a small GET request (~100 bytes) back to the same domain / origin.

For all of the data below, I am only looking at Chrome and Chrome Mobile v121-126 (per the User-Agent string) with support for window.fetchLater(), to ensure a level playing field. The data in Beaconing In Practice looks at reliability across all User-Agents, but the experiments below will focus solely on browsers supporting the fetchLater() API.

(It appears Edge, Opera and Samsung Internet Browser participate in Origin Trials and are sending data as well. I excluded those UAs to keep the results consistent)

Reliability of XMLHttpRequest vs. sendBeacon() vs. fetchLater() in Event Handlers

The first question I wanted to know was: Can fetchLater() be easily swapped into existing analytics libraries (like boomerang.js) to replace sendBeacon() and XMLHttpRequest (XHR) usage, and retain the same (or better) reliability (beacon received rate)?

In boomerang for example, we listen to beforeunload and pagehide to send our final "unload" beacon. Can we just use fetchLater() with { activateAfter: 0 } in those events instead?

For this experiment, I segmented visitors into 3 equally-distributed A/B/C groups (given fetchLater() support):

  • A: Force fetchLater() (with { activateAfter: 0 } so it was sent immediately)
  • B: Force navigator.sendBeacon()
  • C: Force XMLHttpRequest

Each group then attempted to send 6 beacons per page load:

  1. Immediately in the <head> of the HTML
  2. In the page onload event
  3. In the page beforeunload event
  4. In the page unload event
  5. In the page pagehide event
  6. In the page visibilitychange event (for hidden)

By seeing how often each of those beacons arrived, we can consider the reliability of each API, during different page lifecycle events. I’m only including results for page loads where the first step (sending data immediately in the <head>) occurred.

Let’s break the experimental data down by event first:

onload

The onload event is probably the most common event for an analytics library to fire a beacon. Marketing and performance analytics tools will often send their main payload at that point in time.

Here’s example code you could use to send data at onload:

function sendTheBeacon() {
    // XHR
    var xhr = new XMLHttpRequest();
    xhr.open('GET', beaconUrl, true);
    xhr.send();

    // sendBeacon
    navigator.sendBeacon(beaconUrl);

    // fetchLater
    fetchLater(beaconUrl, { activateAfter: 0 }); 
}

window.addEventListener("load", sendTheBeacon, false);

Based on our experimentation, when firing a beacon just at the onload event, fetchLater() appears to be slightly more reliable than sendBeacon() and XHR:

reliability at onload

The numbers are very close though, with approximately a half-million samples in each bucket, there is less than a 1% difference between the three APIs.

This result is different than the Pending Beacon experimentation last year, which showed Pending Beacons coming in less reliably than sendBeacon() — likely due to not using .sendNow() in that experiment.

Broken down by Desktop and Mobile:

reliability at onload - desktop

reliability at onload - mobile

The results are ordered the same across desktop and mobile — all within less than 1 percent reliability difference of each other.

Note: that the above results are for only measuring a beacon sent immediately during the page’s onload event, without accounting for any abandons that happen prior to onload. That is why these numbers are so low — if a user abandoned the page prior to the onload event, they would not be counted in the above chart. See the additional breakdowns below for how these numbers change if you use the suggested abandonment strategy of listening to onload, pagehide and visibilitychange.

Great news that fetchLater() seems to be just as reliable (if not more) than sendBeacon() and XHRs during the onload event!

pagehide or visibilitychange

If the intent is to measure events that occur in the page beyond the onload event, i.e. additional performance or reliability metrics (such as Core Web Vitals or JavaScript errors), tools can send a beacon during one of the page’s unload events, such as beforeunload, unload, pagehide or visibilitychange.

Our recommended strategy is to listen to just pagehide and visibilitychange (for hidden), and not listen to the beforeunload or unload events (which are less reliable and can break BFCache navigations).

Example code:

window.addEventListener("pagehide", sendTheBeacon, false);
window.addEventListener("visibilitychange", function() {
    if (document.visibilityState === 'hidden') {
        sendTheBeacon();
    }
}, false);

So let’s look at the result of sending a beacon immediately during a pagehide or visibilitychange event (if a beacon was received for either event):

reliability at pagehide or visibilitychange

Here we see that sendBeacon() has a slight edge over fetchLater() — about 0.5% more reliable.

XHR trails much farther behind at only 83.% reliable. This is because XHRs can be aborted as the page is abandoned, or the user navigates away.

Let’s break it down by platform:

reliability at pagehide or visibilitychange - desktop

fetchLater() is nearly identical to sendBeacon() reliability on Desktop, with XHR trailing behind.

On Mobile:

reliability at pagehide or visibilitychange - mobile

fetchLater() trails a bit further behind sendBeacon() (1.1% less reliable).

I was hoping these pagehide and visibilitychange[hidden] numbers would mirror what we saw for onload, where fetchLater() would be slightly better than sendBeacon even. However, sendBeacon() appears to have a slight edge in reliability, most notably on mobile platforms when the page is unloading.

I will follow-up with the Chrome team to determine if there’s anything that could be contributing to this.

onload or pagehide or visibilitychange

Finally, let’s combine the above three events per the suggested abandonment strategy, and see how reliable each API is if we’re listening for all 3 events (and sending data once in any of them).

Of course, this increases the reliability of receiving beacons to the maximum possible, with sendBeacon() and fetchLater() able to get a beacon to the server over 98% of the time:

reliability at onload or pagehide or visibilitychange

Broken down by Desktop vs. Mobile, we see that Desktop is has an extremely high rate of receiving beacons, 99% ore more:

reliability at onload or pagehide or visibilitychange - desktop

While Mobile shows a bit less reliably results, but still over 97% for sendBeacon() and fetchLater():

reliability at onload or pagehide or visibilitychange - mobile

Conclusion

From experimenting with using fetchLater() in event handlers, it seems to me that fetchLater() is nearly identical to sendBeacon() in reliability (and both are improvements over XHR).

If sending data during onload, fetchLater() is slightly more reliable than sendBeacon().

If sending data during pagehide or visibilitychange[hidden], sendBeacon() is slightly more reliable than fetchLater() (more pronounced on mobile). It’s probably worthwhile to look into this a bit further why.

NOTE: I measured the reliability of sending beacons during beforeunload and unload as well, but since those events are deprecated / not-recommended / unreliable / break BFCache events, I’ll skip those results in this post.

Reliability of fetchLater() using activateAfter

Here’s an interesting experiment: Let’s say you want to send a beacon to your analytics service, but you don’t have a strong opinion on when that data should be sent.

You don’t necessarily want to send it at startup, as that network request could conflict with the page’s important assets.

As long as it’s sent by the time the page is unloading, that’s good enough!

One naive way you could do this is just use a setTimeout(, n) and call sendBeacon() much later, after the page has fully loaded:

window.addEventListener("load", function() {
    setTimeout(function() {
        navigator.sendBeacon(beaconUrl);
    }, 1000);
}, false);

If you didn’t take into account an abandonment strategy, and you tried different values of N milliseconds, your reliability rate might look like this:

sendBeacon() after setTimeout of N seconds

i.e. waiting 1 second after Page Load you’d only see 96.6% of beacons, while waiting for 60 seconds (and hoping they stick around on your page for 60 seconds) results in only 24.1% of beacons arriving (on this example site).

Of course, you wouldn’t do this in real-life: you’d listen for pagehide and visibilitychange, but this shows a worst-case example.

Here’s where fetchLater() comes in: you can actually use it blindly like this, and have much more positive results! Just specify a { activateAfter: n } value for your preferred delay:

fetchLater(beaconUrl, { activateAfter: 1000 });

The fetchLater() results in doing this are pretty impressive:

fetchLater() after activateAfter of N seconds

Using a value of 1 second only results in 0.2% of beacons being lost, while a value of 600 seconds still gives you 93.7% of all beacons.

Setting activateAfter to a nearly-unlimited value (say 999999999999999), i.e. you’re asking fetchLater() to do all the heavy-lifting to send a beacon whenever the page is abandoned, we still see those beacons arrive 92.3% of the time.

While that isn’t 100% of the time, it’s a lot better and more ergonomic than having to listen to onload, pagehide, visibilitychange, etc.

Our previous experimentation showed that if you want to "hold" your data for unload, listening to all 4 unload-ish events (beforeunload, unload, pagehide, visibilitychange), sending a beacon in those events only resulted in ~82.9% reliability! So fetchLater() is 9.4% (in real terms) more reliable here.

And in the meantime, the draft fetchLater() could be aborted and replaced with additional data up until the page unloads (at which point you could let the "last" values go out, or even replace it again with any at-unload data you want to update).

This reliability varies by platform. If we zoom into using { activateAfter: 60000 } (60s), we can see that Desktop (99.0%) is a lot more reliable than Mobile (90.4%):

fetchLater() 60s

Regardless, fetchLater() offers some unique benefits for sending data.

Follow-Ups

As last time, I want to be open in saying that:

  • Some of my methodology may be flawed.
    • Last time I wasn’t using .sendNow() with Pending Beacon, and that affected the reliability in page-unloading scenarios.
    • Luckily, fetchLater() reduces the complexity a bit, and we now see reliability as-good-as or even better-than sendBeacon() most of the time
  • These results were captured in a A/B/C test on one of my personal websites.
    • Your results will vary!
    • I also have noticed that over time the numbers for all results shift slightly. My A/B/C experimentation was happening simultaneously though, so shouldn’t be affected by changes in time.
  • I’m open to review and criticism or feedback on other things to check.

Given that, there is one follow-up for fetchLater() that I would like to review:

  • Why is fetchLater() in pagehide and visibilitychange[hidden] slightly less reliable than sendBeacon()?
    • I only saw ~0.2% less beacons, but I was hoping it would be equal or better!

How We’re Going to Use it

Given the cool possibilities of fetchLater(), how do I envision taking advantage of it?

For boomerang.js (our RUM measurment tool we use at Akamai mPulse), we have a few different types of beacons we send:

  • Our load beacon at the onload event. This contains all of the performance information from the page.
  • An unload beacon at pagehide and beforeunload. This lets us know how long the user was reading the page.
  • Some websites have enabled an early beacon that gets sent immediately at page initialization, so we avoid any data loss from page abandonment (the user leaves before onload and when event handlers aren’t reliable). If the main beacon doesn’t come in, the early beacon data is used.
  • error beacons contain information about any JavaScript errors that occur during user interactions after the main beacon was sent.
  • spa beacons for Single Page App Soft Navigations.
  • (… and a few more obscure ones)

fetchLater() can help us get data more reliably in a few of these scenarios!

  • early beacons may no longer be necessary: we can queue up a fetchLater() with the same data, and abort it if we reach onload and send our regular data. This will reduce the amount of beacons we send (and that we have to keep in memory in our infrastructure).
  • error beacons could be sent less often: right now our customers often choose to send batches of error beacons every 1 to 5 seconds, to ensure they arrive reliably. We could batch new errors into a fetchLater() beacon that only get sent after 60 seconds, trusting the browser to deliver it (and appending new errors if they occur in the meantime).
  • It would take a bit of engineering to make such a drastic change to our library, but fetchLater() could allow us to combine our load and unload beacons into a single beacon that just gets sent as the page is unloading. (the downside of this is that data may not be as "real time" into our dashboards as it is today, which shows beacons withing 5-10 seconds of the Page Load happening).

We’re hoping to experiment with some of these ideas soon!

TL;DR

  • Last time I experimented with Pending Beacon, I had concerns with ergonomics (lack of Developer Tools Support) and reliability (less beacons arriving than sendBeacon()). Both of these are resolved!
  • I’m really excited for the fetchLater() API. It’s giving developers better ergonomics for sending data, and a more reliable way to send beacons at the end of the page lifetime.
  • The new fetchLater() API is in active development and going through a feedback and Origin Trial cycle.
  • I would suggest analytics libraries seriously consider utilizing the API if available (after the Origin Trial concludes).

Thanks for reading and your support! Please contact me with any feedback, questions, etc.

Beaconing in Practice: An Update on Reliability and the Pending Beacon API

May 26th, 2023

Table of Contents

  1. Introduction
  2. Pending Beacon API
  3. Why Pending Beacons?
  4. Pending Beacon Experiments
    1. Methodology
    2. Reliability of XMLHttpRequest vs. sendBeacon() vs. Pending Beacon in Event Handlers
      1. onload
      2. pagehide or visibilitychange
      3. onload or pagehide or visibilitychange
      4. Conclusion
    3. Reliability of Pending Beacon "now" vs "backgroundTimeout"
    4. Reliability of Pending Beacon "backgroundTimeout" once vs. sendBeacon() in Event Handlers
  5. Misc Findings
  6. Follow-Ups
  7. TL;DR

Introduction

A few years ago, I wrote an article titled Beaconing In Practice that covered all of the aspects of sending telemetry from your web app to a back-end server for analysis (aka "beaconing").

While the contents of that article are still relatively fresh and accurate, there are two new aspects of beaconing that I would like to cover in this post:

  • The new Pending Beacon API
  • The measured reliability of using XMLHttpRequest (XHR) vs. sendBeacon() vs. Pending Beacon for sending data

Pending Beacon API

The Pending Beacon API is an exciting new proposal from Google Chrome engineers.

The goal is to provide an API for developers where they can "queue" data to be sent when a page is being unloaded (by the browser, automatically), rather than requiring developers to explicitly send beacons themselves in events like pagehide or visibilitychange (which don’t always fire reliably).

It is meant to be similar to the navigator.sendBeacon() API, with a simple calling style.

Here’s an example of using the Pending Beacon API to send a beacon when the page is being hidden/unloaded (or a maximum of 60 seconds after "now"):

// queue a beacon for the unloading or +60s
var beacon = new window.PendingGetBeacon(
    beaconUrl,
    {
        timeout: 60000, 
        backgroundTimeout: 0 
    });

(note the above API shape is outdated and the Pending Beacon API will utilize fetch() in future versions)

The API is still being discussed, and is actively evolving based on community and browser vendor feedback.

If you want to experiment with the Pending Beacon in Chrome today, you can register for an Origin Trial for Chrome 107-115. Though, again, note that the current API shape (with the window.PendingGetBeacon and window.PendingPostBeacon interfaces) is evolving towards being an option of fetch() instead.

Why Pending Beacons?

One of the challenges highlighted in the Beaconing In Practice article is how to reliably send data once it’s been gathered in a web app.

Developers frequently use events such as beforeunload/unload or pagehide/visibilitychange as a trigger for beaconing their data, but these events are not reliably fired on all platforms. If the events don’t fire, the beacons don’t get sent.

For example, if you want to gather all of your data and only send it once as the page is unloading, registering for all 4 of those events will only give you ~82.9% reliability in ensuring the data arrives at your server, even when using the sendBeacon() API.

So, wouldn’t it be lovely if developers had a more reliable way of "queuing" data to be sent, and have the browser automagically send it once the page starts to unload? That’s where the Pending Beacon API comes in.

The Pending Beacon API gives developers a way to build a "pending" beacon. That pending beacon can then be mutated over time, or later discarded. The browser will then handle sending it (in its latest state) when the page is being hidden or unloading, so developers no longer need to listen to the beforeunload/unload/pagehide/visibilitychange events.

Ideally, Pending Beacon will be a mechanism that can replace usage of sendBeacon() in browsers that support it, giving more reliable delivery of beacon data and better developer ergonomics (by not having to listen for, and send data during, unload-ish events).

Pending Beacon Experiments

Given those goals, I was curious to see how reliable Pending Beacon would be compared to existing APIs like XMLHttpRequest (XHRs) or the sendBeacon() API. I performed three experiments comparing how reliably data arrived after using one of those APIs in different scenarios.

Let’s explore three questions:

  1. Can we swap PendingBeacon in for usage of XHR and/or sendBeacon() in unload event handlers?
  2. How reliable is asking PendingBeacon to send data "now" vs with a backgroundTimeout?
  3. How reliable is queuing PendingBeacon data to be sent at page unload vs. listening to event handlers and using sendBeacon() in them?

Methodology

Over the course of a month, on a site that I control (with approx 2M page views), I ran an experiment gathering data from browsers using the following three APIs:

All of these APIs sent a small GET request back to the same domain / origin.

For all of the data below, I am only looking at Chrome and Chrome Mobile v107-115 (per the User-Agent string) with support for window.PendingGetBeacon, to ensure a level playing field. The data in Beaconing In Practice looks at reliability across all User-Agents, but the experiments below will focus solely on browsers supporting the Pending Beacon API.

Note that all of these tests were done with the PendingGetBeacon interface, before the current proposal to have this be a fetch() option. I’m unsure how the most recent proposal will affect these results, but I will re-do the test once that fetch() update is available.

Reliability of XMLHttpRequest vs. sendBeacon() vs. Pending Beacon in Event Handlers

The first question I wanted to know was: Can Pending Beacon be easily swapped into existing analytics libraries (like boomerang.js) to replace sendBeacon() and XMLHttpRequest (XHR) usage, and retain the same (or better) reliability (beacon received rate)?

In boomerang for example, we listen to beforeunload and pagehide to send our final "unload" beacon. Can we just use Pending Beacon instead?

For this experiment, I segmented visitors into 3 equally-distributed A/B/C groups (given Pending Beacon API support):

  • A: Force PendingGetBeacon (with { timeout: 0, backgroundTimeout: -1 } so it was sent immediately)
  • B: Force navigator.sendBeacon()
  • C: Force XMLHttpRequest

Each group then attempted to send 6 beacons per page load:

  1. Immediately in the <head> of the HTML
  2. In the page onload event
  3. In the page beforeunload event
  4. In the page unload event
  5. In the page pagehide event
  6. In the page visibilitychange event (for hidden)

By seeing how often each of those beacons arrived, we can consider the reliability of each API, during different page lifecycle events. I’m only showing data for page loads where the first step (sending data immediately in the <head>) occurred.

Let’s break the experimental data down by event first:

onload

The onload event is probably the most common event for an analytics library to fire a beacon. Marketing and performance analytics tools will often send their main payload at that point in time.

Based on our experimentation, when firing a beacon just at the onload event, sendBeacon() seems slightly more reliable than XHR, which is slightly more reliable than PendingGetBeacon.

reliability at onload

sendBeacon() being more reliable than XHR is expected — the whole point of sendBeacon() is to allow the browser to send data asynchronously of the page, in case it unloads after the beacon is queued up.

However, I’m surprised that PendingGetBeacon appears to be the least reliable (by about 1% less than XHR), at least from my experiments.

Broken down by Desktop and Mobile:

reliability at onload - desktop

reliability at onload - mobile

Desktop is able to deliver beacons more reliably across all 3 APIs than mobile. On mobile, PendingGetBeacon is about 2.8% less reliable than sendBeacon().

Note: that the above results are for only measuring a beacon sent immediately during the page’s onload event, without accounting for any abandons that happen prior to onload. That is why these numbers are so low — if a user abandoned the page prior to the onload event, they would not be counted in the above chart. See the additional breakdowns below for how these numbers change if you use the suggested abandonment strategy of listening to onload, pagehide and visibilitychange.

I was hoping the Pending Beacon API would be at-least-or-better reliable than sendBeacon(), so I think there’s something to investigate here.

pagehide or visibilitychange

If the intent is to measure events that occur in the page beyond the onload event, i.e. additional performance or reliability metrics (such as Core Web Vitals or JavaScript errors), tools can send a beacon during one of the page’s unload events, such as beforeunload, unload, pagehide or visibilitychange.

Our recommended strategy is to listen to just pagehide and visibilitychange (for hidden), and not listen to the beforeunload or unload events (which are less reliable and can break BFCache navigations).

So let’s look at the result of sending a beacon immediately during a pagehide or visibilitychange event (if a beacon was received for either event):

reliability at pagehide or visibilitychange

This is showing that sendBeacon() is still reigning supreme for reliability (95.8%), with PendingGetBeacon slightly behind (89.1%) and XHR trailing that (84.9%).

However, when we break it down by Desktop:

reliability at pagehide or visibilitychange - desktop

PendingGetBeacon is nearly as reliable as sendBeacon(), with XHR trailing behind, while on Mobile:

reliability at pagehide or visibilitychange - mobile

There appears to be a huge drop-off in reliability for PendingGetBeacon on Mobile vs. Desktop.

Possibly a bug with Pending Beacon in Chrome’s initial implementation here? This data would give me pause in swapping to Pending Beacon right now.

onload or pagehide or visibilitychange

Finally, let’s combine the above three events per the suggested abandonment strategy, and see how reliable each API is if we’re listening for all 3 events (and sending data once in any of them).

Of course, this increases the reliability of receiving beacons to the maximum possible, with sendBeacon() able to get a beacon to the server 98% of the time:

reliability at onload or pagehide or visibilitychange

Broken down by Desktop vs. Mobile, we see that Desktop is has an extremely high rate of receiving beacons:

reliability at onload or pagehide or visibilitychange - desktop

While Mobile continues to show a possible issue with PendingGetBeacon vs. sendBeacon() (a 7.7% drop-off)!

reliability at onload or pagehide or visibilitychange - mobile

Conclusion

From this experiment at least, it appears sendBeacon() continues to be the most reliable way of sending beacon data.

If sending data during onload, sendBeacon() is slightly more reliable than PendingGetBeacon.

However, there appears to be a bug with PendingGetBeacon during a page-unloading scenario like pagehide or visibilitychange, particularly on Mobile. If the Chrome engineers can figure out a way to increase the reliability there, I would expect the Pending Beacon API to be equivalent to using sendBeacon() (which is our preferred mechanism today).

NOTE: I measured the reliability of sending beacons during beforeunload and unload as well, but since those events are deprecated / not-recommended / unreliable / break BFCache events, I’ll skip those results in this post.

Reliability of Pending Beacon "now" vs "backgroundTimeout"

The next experiment I ran was to determine if the backgroundTimeout functionality of the Pending Beacon API was reliable to use.

Here’s the description of the parameter (which has changed slightly with the fetch()-based proposal, but I would guess would operate similarly):

  • backgroundTimeout: A mutable Number property specifying a timeout in milliseconds whether the timer starts after the page enters the next hidden visibility state. If setting the value >= 0, after the timeout expires, the beacon will be queued for sending by the browser, regardless of whether or not the page has been discarded yet. If the value < 0, it is equivalent to no timeout and the beacon will only be sent by the browser on page discarded or on page evicted from BFCache. The timeout will be reset if the page enters visible state again before the timeout expires. Note that the beacon is not guaranteed to be sent at exactly this many milliseconds after hidden, because the browser has freedom to bundle/batch multiple beacons, and the browser might send out earlier than specified value (see Privacy Considerations). Defaults to -1.

In other words, ask the browser to send a beacon after backgroundTimeout milliseconds of being hidden.

This can be very useful as an alternative to listening to the pagehide / visibilitychange events for beaconing your "last" bits of data. If you regularly update your Pending Beacon, you may not need to listen to those event at all.

But can we trust the browser to still send our Pending Beacon, after we’ve queued it up?

For this experiment, I segmented visitors into 2 equally-distributed A/B groups (given Pending Beacon API support):

  • A: Force PendingGetBeacon to send a beacon now (with { timeout: 0, backgroundTimeout: -1 }
  • B: Force PendingGetBeacon to send a beacon after 60s or when the page is hidden (with { timeout: 60000, backgroundTimeout: 0 }

We are considering doing something similar to group B for boomerang.js, i.e. send all beacons within 60 seconds of the page load (so the data is still "real-time fresh" in dashboards), and asking the browser to send the data immediately if the user navigates away or closes the browser before then.

Let’s look at the results of using PendingGetBeacon to send a beacon "now" vs. "when the page is hidden/unloads":

pendingbeacon now vs 60s/unload

Given a baseline of 100% meaning we received a "now" PendingGetBeacon, we’re seeing the 60s timeout + @hidden beacon about 98.5% of the time across Desktop and Mobile.

Desktop is slightly more reliable (99.7%) vs. Mobile (96.4%).

I think this is a great result, confirming the value-add of PendingGetBeacon. Instead of having to add event listeners for pagehide visibilitychange and a 60s setTimeout(), the browser delivered the beacon very reliably on its own!

Remember, listening to all 4 unload-ish events (beforeunload, unload, pagehide, visibilitychange) and sending a beacon in those events only resulted in ~82.9% reliability!

And in the meantime, the pending beacon could be manipulated to add/remove additional data up until the page unloads.

Reliability of Pending Beacon "backgroundTimeout" once vs. sendBeacon() in Event Handlers

Given that the last experiment showed that Pending Beacon with backgroundTimeout was very reliable in sending beacons at page unload, what is the difference between using PendingGetBeacon with backgroundTimeout: 0 vs. listening for pagehide and visibilitychange and sending a beacon with sendBeacon()?

pendingbeacon vs sendBeacon for unload

Great news! Not only is the PendingGetBeacon more ergonomic (not having to listen for pagehide and visibilitychange events), it’s more reliably sending data when the page is unloading.

One interesting result I see here, is that the PendingGetBeacon reliability with backgroundTimeout: 0 was more reliable than listening to pagehide and visibilitychange and using PendingGetBeacon (now) in those events directly. This is likely due to the fact that pagehide and visibilitychange aren’t 100% reliable in the first place, but I would hope for it to be as-close-to sendBeacon() reliable as possible.

Misc Findings

  • There’s currently no way to debug Pending Beacon in Chrome Developer Tools — outgoing beacons are not visible in the Network tab. This makes it very hard to debug or verify that the feature is working. When debugging issues with boomerang.js I am constantly reviewing the outgoing beacon, so not having visibility into Dev Tools would be a huge hinderance. There’s an Github issue tracking this.
  • While reviewing the data, I found some Samsung Internet browser data in the data-set, indicating that it supported window.PendingGetBeacon.
    • However, I only received data for XMLHttpRequest and sendBeacon() beacons.
    • Does this mean Samsung Internet browser (which is Chromium-based) is registering window.PendingGetBeacon but not fully implementing the beacon sending? I will need to investigate more.

Follow-Ups

First, I want to say that my experiments and conclusions probably have some flaws. I’ve reviewed and re-reviewed my methodology and queries several times, but I am a human (I think!) and make mistakes. I’m hoping others can review this data.

Given that, some follow-ups I plan on doing based on the above findings:

  • Re-do this analysis once the interface is changed to be Fetch-based
  • Review the reliability data with the Chrome engineers to see if we should file bugs for any of the drops in reliability vs. sendBeacon(), in particular:
    • During onload Pending Beacon (with timeout:0) is about 1.2% less reliable than sendBeacon()
    • During pagehide and visibilitychange Pending Beacon (with timeout:0), on Mobile, is about 17.9% less reliable than sendBeacon()
  • Review why the Samsung Internet browser is registering the window.PendingGetBeacon interface but not sending any beacons (was there an error I wasn’t catching?)

TL;DR

  • I’m really excited for the Pending Beacon API. I think it’s going to developers better ergonomics for sending data, and a more reliable way to send beacons at the end of the page lifetime
  • The new Pending Beacon API is in active development and going through a feedback and Origin Trial cycle
  • There may be some small reliability issues vs. sendBeacon() that should be investigated before widespread adoption
  • The navigator.sendBeacon API still seems to be the most reliable mechanism for sending beacons, if you’re queuing up data to be sent in pagehide or visibilitychange events

Modern Metrics

November 30th, 2022

At performance.now() 2022, I gave a talk titled "Modern Metrics (2022)".

Modern Metrics (2022)

Here’s the description:

What is a “modern” metric anyway? An exploration on how to measure and evaluate popular (and experimental) web performance metrics, and how they affect user happiness and business goals.

We’ll talk about how data can be biased, and how best to interpret performance data given those biases. We’ll look at a broad set of RUM data we’ve captured to see how the Core Web Vitals correlate (or not) to other performance and business metrics. Finally, we’ll share a new way that others can research modern metrics and RUM data.

At the conference, we also announced a new project called the RUM Archive. Inspired by other projects like archive.org and httparchive.org, we want to make RUM data available for public research. We’re regularly exporting aggregated RUM data from Akamai mPulse to start!

RUM Archive

I’ll blog more about the RUM Archive later!

You can watch the presentation on YouTube or catch the slides.

JS Self-Profiling API In Practice

December 31st, 2021

Table of Contents

The JS Self-Profiling API

The JavaScript Self-Profiling API allows you to take performance profiles of your JavaScript web application in the real world from real customers on real devices. In other words, you’re no longer limited to only profiling your application on your personal machines (locally) from browser developer tools! Profiling your application is a great way to get insight into its performance. A profile will help you see what is running over time (its "stack"), and can identify "hot spots" in your code.

You may be familiar with profiling JavaScript if you’ve ever used a browser’s developer tools. For example, in Chrome’s Developer Tools in the Performance tab, you can record a profile. This profile provides (among other things) a view of what’s running in the application over time.

browser developer tools

In fact, this API actually reminds me a bit more of the simplicity of the old JavaScript Profiler tab, which is still available in Chrome, but hidden in favor of the new Performance tab.

Chrome's Developer Tools' old JavaScript Profiler tab

The JS Self-Profiling API is a new API, currently only available in Chrome versions 94+ (on Desktop and Android). It provides a sampling profiler that you can enable, from JavaScript, for any of your visitors.

The API is a currently a WICG draft, and is being evaluated by browsers before possibly being adopted by a W3C Working Group such as the Web Performance WG.

What is Sampled Profiling?

There are two common types of performance profilers in use today:

  1. Instrumented (or "structured" or "tracing") Profilers, in which an application is invasively instrumented (modified) to add hooks at every function entry and exit, so the exact time spent in each function is known
  2. Sampled Profilers, which temporarily pause execution of the application at a fixed frequency to note ("sample") what is running on the call stack at that time

The JS Self-Profiling API starts a sampled profiler in the browser. This is the same profiler that records traces in browser developer tools.

The "sampling" part of the profiler means that the browser is basically taking a snapshot at regular intervals, checking what’s currently running on the stack. This is a lightweight way of tracing an application’s performance, as long as the sampling interval isn’t too frequent. Each regularly-spaced sampling interrupt quickly inspects the running stack and notes it for later. Over time, these sampled stacks can give you a indication of what was commonly running during the trace, though sometimes samples can also mislead (see Downsides below).

Consider a diagram of the function stacks running in an application over time. A sampling profiler will attempt to inspect the currently-running stack at regular intervals (the vertical red lines), and report on what it sees:

sampled profiler stacks

The other common method of profiling an application, often called a instrumented or tracing or structured profiler, relies on invasively modifying the application so that the profiler knows exactly when every function is called, begins and ends. This invasive measurement has a lot of overhead, and can slow down the application being measured. However, it provides an exact measurement of the relative time being spent in every function, as well as exact function call-counts. Due to the overhead that comes from invasively hooking every function entry and exit, the app will be slowed down (spending time in instrumentation).

Instrumented profiling has a time and place, but it’s generally not performed in the "real world" on your visitors — as it will slow down their experience. This is why sampled profiling is more popular on the web, as it has a smaller performance impact on the application being sampled.

With this API, you can choose the sampling frequency. In my testing, Chrome currently doesn’t let you sample any more frequently than once every 16ms (Windows) or 10ms (Mac / Android).

If you want to learn more about the different types of profiling, I highly recommend viewing Ilya Grigorik’s Structural and Sampling JavaScript Profiling
in Google Chrome
slides from 2012. It goes into further details about when to use the two types of profilers and how they complement each other.

Note: further in this document I may use the term "traces" to describe the data from a Sampled Profiler, not from a Tracing Profiler.

Downsides to Sampled Profiling

Unlike Instrumented Profilers that trace each function’s entry and exit (which increases the measurement overhead significantly), Sampled Profilers simply poll the stack at regular intervals to determine what’s running.

This type of lightweight profiling is great for reducing overhead, but it can lead to some situations where the data it captures is misleading at best, or wrong at worst.

Let’s look at the previous call stack and the 8 samples it took, pretending the samples were 10ms apart:

sampled profiler stacks

Since the Sampled Profiler doesn’t know any better, it guesses that any hit during its regular sampling interval was running for that entire interval, i.e. 10ms.

If a Sampled Profiler was examining that stack at those regular intervals (the vertical red lines), it would report the overall time spent in these stacks as:

  • A->B->C: 1 hit (10ms)
  • A->B: 2 hits (20ms)
  • A: 1 hit (10ms)
  • D: 2 hits (20ms)
  • idle: 2 (20ms)

While this is a decent representation of what was running over those 80ms, it’s not entirely accurate:

  • A->B->C is over-reported by 6ms
  • A->B is over-reported by 12ms
  • A is under-reported by 8ms
  • D is over-reported by 8ms
  • D->D->D is missing and under-reported by 4ms
  • idle is under-reported by 15ms

This mis-reporting can get worse in a few canonical cases. Most application stacks won’t be this simple, so it’s unlikely you’ll see this happen exactly as-is in the real world, but it’s useful to understand.

First, consider a case where your sampled profiler is taking samples every 10ms, and your application has a task that executes for 2ms approximately every 16ms. Will the Sampled Profiler even notice it was running?

sampled profiler stacks - bad case

Maybe, or maybe not — depends on when the sampling happens, and the frequency/runtime of the function. In this case, the function is executing for 12.5% of the runtime, but may get un-reported.

Taken to the extreme, this same function may have the exact same interval frequency as the profiler, but only execute for that 1ms that was being sampled:

sampled profiler stacks - bad case

In this case, the function may be only running for 12.5% of the runtime, but may get reported as running 100% of the time.

To the other extreme, you could have a function which runs at 10ms intervals but only for 8ms:

sampled profiler stacks - bad case

Depending on when the Sampling Profiler hits, it may not get reported at all, even though it’s executing for 80% of the time.

All of these are "canonically bad" examples, but you could see how some types of program behavior may get mis-represented by a Sampled Profiler. Something to keep in mind as you’re looking at traces!

API

Document Policy

In order to allow the JavaScript Self-Profiling API to be called, there needs to be a Document Policy on the HTML page, called js-profiling. This is usually configured via a HTTP response header called Document-Policy, or via a <iframe policy=""> attribute.

A simple example of enabling the API would be this HTTP response header (for the HTML page):

Document-Policy: js-profiling

Once enabled, any JavaScript on the page can start profiling, including third-party scripts!

API Shape

The JS Self-Profiling API exposes a new Profiler object (in browsers that support it).

Creating the object starts the Sampled Profiler, and you can later call .stop() on the object to stop profiling and get the trace back (via a Promise).

if (typeof window.Profiler === "function") {
  var profiler = new Profiler({ sampleInterval: 10, maxBufferSize: 10000 });

  // do work
  profiler.stop().then(function(trace) {
    sendProfile(trace);
  });
}

Or if you’re into whole await thing:

if (typeof window.Profiler === "function") {
  const profiler = new Profiler({ sampleInterval: 10, maxBufferSize: 10000 });

  // do work
  var trace = await profiler.stop();
  sendProfile(trace);
}

The two main options you can set when starting a profile are:

  • sampleInterval is the application’s desired sample interval (in milliseconds)
    • Once started, the true sampling rate is accessible via profiler.sampleInterval
  • maxBufferSize is the desired sample buffer size limit, measured in number of samples

There is usually a measurable delay to starting a new Profiler(), as the browser needs to prepare its profiler infrastructure.

In my testing, I’ve found that new profiles usually take 1-2ms to start (e.g. before new Profiler() returns) on both desktop and mobile.

Sample Interval

The sampleInterval you specify (in milliseconds) determines how frequently the browser wakes up to take samples of the JavaScript call stack.

Ideally, you would want to choose a small enough interval that gives you data as accurately as possible without there being measurement overhead.

The draft spec suggests you need to simply specify a value greater than or equal to zero (though I’m not sure what zero would mean), though the User Agent may choose the rate that it ultimately samples at.

In practice, in Chrome 96+, I’ve found the following minimum sampling rates supported:

  • Windows Desktop: 16ms
  • Mac/Linux Desktop, Android: 10ms

Meaning, if you specify sampleInterval: 1, you will only get a sampling rate of 16ms on Windows.

You can verify the sampling rate that was chosen by the User Agent by inspecting the .sampleInterval of any started trace:

const profiler = new Profiler({ sampleInterval: 1, maxBufferSize: 10000 });
console.log(profiler.sampleInterval);

In addition, it appears in Chrome that the chosen actual sample interval is rounded up to the next multiple of the minimum, so 16ms (Windows) or 10ms (Mac/Android).

For example, if you choose a sampleInterval of between 91-99ms on Android, you’ll get 100ms instead.

Buffer

The other knob you control when starting a trace is the maxBufferSize. This is the maximum number of samples the Profiler will take before stopping on its own.

For example, if you specify a sampleInterval: 100 and a maxBufferSize: 10, you will get 10 samples of 100ms each, so 1s of data.

If the buffer fills, the samplebufferfull event fires and no more samples are taken.

if (typeof window.Profiler === "function")
{
  const profiler = new Profiler({ sampleInterval: 10, maxBufferSize: 10000 });

  function collectAndSendProfile() {
    if (profiler.stopped) return;

    sendProfile(await profiler.stop());
  }

  profiler.addEventListener('samplebufferfull', collectAndSendProfile);

  // do work, or listen for some other event, then:
  // collectAndSendProfile();
}

Who to Profile

Should you enable a Sampled Profiler for all of your visitors? Probably not. While the observed overhead appears to be small, it’s best not to burden all visitors with sampling and collecting this data.

Ideally, you would probably sample your Sampled Profiler activations as well.

You could consider turning it on for 10% or 1% or 0.1% of your visitors, for example.

The main reasons you wouldn’t want to enable this for all visitors are:

  • While minimal, enabling sampling has some associated cost, so you probably don’t want to slow down all visitors
  • The amount of data produced by a sampled profiler trace is significant, and your probably don’t want your servers to have to deal with this data from every visitor
  • As of 2021-12, the only browser that supports this API is Chrome, so your profiles will be biased towards that browser, as well as the above downsides

Enabling the profiler for a sample of specific page loads, or a sample of specific visitors seems ideal.

When to Profile

Now that you’ve determined that this current page or visitor should be profiled, when should you turn it on?

There are a lot ways you can utilize profiling during a session: specific events, user interactions, the entire page load itself, and more.

Specific Operations

Your app probably has a few complex operations that it regularly executes for visitors.

Instrumenting these operations (on a sampled basis) may be useful in the cases where you don’t know how the code is flowing and performing in the real world. It could also be useful if you’re calling into third-party scripts where you don’t fully understand their cost.

You could simply start the Profiler at the beginning of the operation and stop it once complete.

The trace data you capture won’t necessarily be limited to just the code you’re profiling, but that can also help you understand if your operations are competing with any other code.

function loadExpensiveThirdParty() {
  const profiler = new Profiler({ sampleInterval: 10, maxBufferSize: 1000 });

  loadThirdParty(async function onThirdPartyComplete() {
      var trace = await profiler.stop();
      sendProfile(trace);
  });
}

User Interactions

User interactions are great to profile from time to time, especially if metrics like First Input Delay are important to you.

There are a couple approaches you could take regarding when to start the profiler when measuring user interactions:

  • Have one always running. When a user interacts, trim the profile to a short amount of time before and after the events
    • If you’re using EventTiming and have an active Profiler, you could measure from the event’s startTime to processingEnd to understand what was running before, during and as a result of the event
  • Turn on a Profiler once the mouse starts moving, or moving towards a known click-able target
  • Turn on a Profiler once there’s an event like mousedown where you expect the user to follow through with their interaction

If you wish to wait for a user interaction to start a profiler, note that creating a new Profiler() has a measurable cost (1-2ms) in many cases.

Here’s an example of having a long-running Profiler available for when there are user interactions, via EventTiming:

// start a profiler to be monitoring all times
let profiler = new Profiler({ sampleInterval: interval, maxBufferSize: 10000 });

// when there are impactful EventTiming events like 'click', filter to those samples and start a new Profiler
const observer = new PerformanceObserver(function(list) {
    const perfEntries = list.getEntries().forEach(entry => {
        if (profiler && !profiler.stopped && entry.name === 'click') {
            profiler.stop().then(function(trace) {
                const filteredSamples = trace.samples.filter(function(sample) {
                    return sample.timestamp >= entry.startTime && sample.timestamp <= entry.processingEnd;
                });

                // do something with the filteredSamples and the event

                // start a new profiler
                profiler = new Profiler({ sampleInterval: interval, maxBufferSize: 10000 });
            });
        }
    });
})
.observe({type: 'event', buffered: true});

Page Load

If you want to profile the entire Page Load process, it’s best to start the Profiler via an inline <script> tag before any other Scripts in the <head> of your document.

You could then wait for the page’s onload event, plus a delay, before processing/sending the trace.

You may also want to listen to the pagehide or visibilitychange events to determine if the visitor abandons the page before it fully loads, and send the profile then. Note there are challenges when sending from unload events.

If you’re measuring other important aspects, metrics and events of the Page Load process, like Long Tasks or EventTiming events, having a Sampled Profiler trace to understand what was running during those events can be very enlightening.

Overhead

Any time you enable a profiler, the browser will be doing extra work to capture the performance data. Luckily a Sampled Profiler is a bit cheaper to do than an Instrumented Profiler, but what is its cost in the real-world?

Facebook, one of the primary drivers of this API, has reported that initial data suggests enabling profiling slows load time by <1% (p=0.05).

In my own experimentation on one of my websites, there was no noticeable difference in Page Load times between sessions with profiling enabled and those without.

This is great news, though I would love to see more experimentation and evaluation of the performance impacts of this API. If you’ve used the JS Self-Profiling API, please share your experimentation results!

Anatomy of a Profile

The profile trace object returned from the Profiler.stop() Promise callback is described in the spec’s appendix, and contains four main sections:

  • frames contains an array of frames, i.e. individual functions that could be part of a stack
    • You may see DOM functions (such as set innerHTML) or even Profiler (for work the Sampled Profiler is doing) here
    • If a frame is missing a name it’s likely JavaScript executing in the root of a <script> tag or external JavaScript file, see this note for a workaround
  • resources contains an array of all of the resources that contained functions that have a frame in the trace
    • The page itself is often (always?) the first in the array, with any other external JavaScript files or pages following
  • samples are the actual profiler samples, with a corresponding timestamp for when the sample occurred and a stackId pointing at the stack executing at that time
    • If there is no stackId, nothing was executing at that time
  • stacks contains an array of frames that were running on the top of the stack
    • Each stack may have an optional parentId, which maps into the next node of the tree for the function that called it (and so forth)

This format is unique to the JS Self-Profiling API, and cannot be used directly in any other tool (at the moment).

Here’s a full example:

{
  "frames": [
    { "name": "Profiler" }, // the Profiler itself
    { "column": 0, "line": 100, "name": "", "resourceId": 0 }, // un-named function in root HTML page
    { "name": "set innerHTML" }, // DOM function
    { "column": 10, "line": 10, "name": "A", "resourceId": 1 } // A() in app.js
    { "column": 20, "line": 20, "name": "B", "resourceId": 1 } // B() in app.js
  ],
  "resources": [
    "https://example.com/page",
    "https://example.com/app.js",
  ],
  "samples": [
      { "stackId": 0, "timestamp": 161.99500000476837 }, // Profiler
      { "stackId": 2, "timestamp": 182.43499994277954 }, // app.js:A()
      { "timestamp": 197.43499994277954 }, // nothing running
      { "timestamp": 213.32999992370605 }, // nothing running
      { "stackId": 3, "timestamp": 228.59999990463257 }, // app.js:A()->B()
  ],
  "stacks": [
    { "frameId": 0 }, // Profiler
    { "frameId": 2 }, // set innerHTML
    { "frameId": 3 }, // A()
    { "frameId": 4, "parentId": 2 } // A()->B()
  ]
}

To figure out what was running over time, you look at the samples array, each entry containing a timestamp of when the sample occurred.

For example:

"samples": [
  ...
  { "stackId": 3, "timestamp": 228.59999990463257 }, // app.js:A()->B()
  ...
]

If that sample does not contain a stackId, nothing was executing.

If that sample contains a stackId, you look it up in the stacks: [] array by the index (3 in the above):

"stacks": [
  ...
  2: { "frameId": 3 }, // A()
  3: { "frameId": 4, "parentId": 2 } // A()->B()
]

We see that stackId: 3 is frameId: 4 with a parentId: 2.

If you follow the parentId chain recursively, you can see the full stack. In this case, there are only two frames in this stack:

frameId:4
frameId:3

From those frameIds, look into the frames: [] array to map them to functions:

"frames": [
...
  3: { "column": 10, "line": 10, "name": "A", "resourceId": 1 } // A() in app.js
  4: { "column": 20, "line": 20, "name": "B", "resourceId": 1 } // B() in app.js
],

So the stack for the sample at 228.59999990463257 above is:

B()
A()

Meaning, A() called B().

Beaconing

Once a Sampled Profile trace is stopped, what should you do with the data? You probably want to exfiltrate the data somehow.

Depending on the size of the trace, you could either process it locally first (in the browser), or just send it raw to your back-end servers for further analysis.

If you will be sending the trace elsewhere for processing, you will probably want to gather supporting evidence with it to make the trace more actionable.

For example, you could gather alongside the trace:

  • Performance metrics, such as Page Load Time or any of the Core Web Vitals
    • These can help you understand if the Sampled Profile trace is measuring a user experience that was "good" vs. "bad"
  • Supporting performance events, such as Long Tasks or EventTiming events
    • These can help you understand what was happening during "bad" events by correlating samples with events such as Long Tasks
  • User Experience characteristics, such as User Agent / Device information, page dimensions, etc
    • These can help you slice-and-dice your data, and help narrow down your search if you come across patterns of "bad" experiences

Sampled Profiles are most helpful when you can understand the circumstances under which they were taken, so make sure you have enough information to know whether the trace is a "good" user experience or a "bad" one.

Size

Depending on the frequency (sampleInterval) and duration (or maxBufferSize) of your profiles, the resulting trace data can be 10s or 100s of KB! Simply taking the JSON.stringify() representation of the data may not be the best choice if you intend on uploading the raw trace to your server.

In a sample of ~50,000 profiles captured from my website, where I was profiling from the start of the page through 5 seconds after Page Load, the traces averaged about 25 KB in size. The median page load time on this site is about 2 seconds, so these traces captured about 7 seconds of data. These traces are essentially the JSON.stringify() output of the trace data.

The good news is 25 KB is reasonable where you could just take the simplest approach and upload it directly to a server for processing.

Compression

You also have a few other options for reducing the size of this data before you upload, if you’re willing to trade some CPU time.

One option is the Compression Stream API, which gives you the ability to get a gzip-compressed stream of data from your string input. It should be available (in Chrome) whenever the JS Self-Profiling API is available. One downside is that it is (currently) async-only, so you will need to wait for a callback with the compressed bytes, before you can upload your compressed profile data.

If you expect to send this data via the application/x-www-form-urlencoded encoding, be aware that URL-encoding JSON.stringify() strings results in a much larger string. For example, a 25 KB JSON object from JSON.stringify() grows to about 36 KB if application/x-www-form-urlencoded encoded.

To avoid this bloat, you could alternatively consider something like JSURL. JSURL is an interesting library that looks similar to JSON, but encodes a bit smaller for URL-encoded data (like application/x-www-form-urlencoded data).

Besides these generic compression methods that can be applied to any string data, someone smart could probably come up with a domain-specific compression scheme for this data if they desired! Please!

Analyzing Profiles

Once you’ve started capturing these profiles from your visitors and have been beaconing them to your servers, now what?

Assuming you’re sending the full trace data (and not doing profile analysis in the browser before beaconing), you have a lot of data to work with.

Let’s split the discussion between looking at individual profiles (for debugging) and in bulk (aggregate analysis).

Individual Profiles

As far as I’m aware, there aren’t any openly-available ways of visualizing this trace data in any of the common browser developer tools.

While the JS Self-Profiling API Readme mentions that Mozilla's perf.html visualization tool for Firefox profiles or Chrome's trace-viewer (chrome://tracing) UI could be trivially adapted to visualize the data produced by this profiling API., I do not believe this had been done yet.

Ideally, someone could either update one of the existing visualization tools, or write a converter to change the JS Self-Profiling API format into one of the existing formats. I have seen a comment from a developer that the Specto visualization tool may be able to display this data soon, which would be great!

Until then, I don’t think it’s very feasible to review individual traces "by hand".

With the knowledge of the trace format and just a little bit of code, you could easily post-process these traces to pick out interesting aspects of the traces. Which brings us to…

Bulk Profile Analysis

Given a large number of sampled profiles, what insights could you gain from them?

This is an inherently challenging problem. Given a sample of visitors with tracing enabled, and each trace containing KB or MB of trace data, knowing how to effectively use that data to narrow down performance problems is no easy feat.

The infrastructure required to do this type of bulk analysis is not insignificant, though it really boils down to post-processing the traces and aggregating those insights in ways that make sense.

As a starting point, there are at least a few ways of distilling sampled profile traces down into smaller data points. By aggregating this type of information for each trace, you may be able to spot patterns, such as which hot functions are more often seen in slower scenarios.

For example, given a single sampled profile trace, you may be able to extract its:

  • Top N function(s) (by exclusive time)
  • Top N function(s) (by inclusive time)
  • Top N file(s)

If you captured other supporting information alongside the profile, such as Long Tasks or EventTiming events, you could provide more context to why those events were slow as well!

Aggregating this information into a traditional analytics engine, and you may be able to gain insight into which code to focus on.

Gotchas

Of course, no API is perfect, and there are a few ways this API can be confusing, misleading, or hard to use.

Here are a few gotchas I’ve encountered.

Minified JavaScript

If your application contains minified JavaScript, the Sampled Profiles will report the minified function names.

If you will be processing profiles on your server, you may want to un-minify them via the Source Map artifacts from the build.

Named Functions

One issue that I came across while testing this API on personal websites was that I was finding a lot of work triggered by "un-named" functions:

{
  "frames": [
    ...
    { "column": 0, "line": 10, "name": "", "resourceId": 0 }, // un-named function in root HTML page
    { "column": 0, "line": 52, "name": "", "resourceId": 0 }, // another un-named function in root HTML page
    ...
  ],

These frames were coming from the page itself (resourceId: 0), i.e. inline <script> tags.

They’re hard to map back to the original function in the HTML, since the page’s HTML may differ by URL or by visitor.

One thing that helped me group these frames better was to change the inline <script>‘s JavaScript so that they run from named anonymous functions.

e.g. instead of:

<script>
// start some work
</script>

Simply wrap it in a named IIFE (Immediately Invoked Function Expression):

<script>
(function initializeThirdPartyInHTML() {
  // start some work
})();
</script>

Then the frames array provides better context:

{
  "frames": [
    ...
    { "column": 0, "line": 10, "name": "initializeThirdPartyInHtml", "resourceId": 0 }, // now with 100% more name!
    { "column": 0, "line": 52, "name": "doOtherWorkInHtml", "resourceId": 0 },
    ...
  ],

Cross-Origin Scripts

When the API was first being developed and experimented with, it came with a requirement that the page being profiled have cross-origin isolation (COI) via COOP and COEP. If any third-party script did not enable COOP/COEP, then the API could not be used.

This requirement unfortunately made the API nearly useless for any site that includes third-party content, as forcing those third-parties into COOP/COEP compliance is tricky at best.

Thankfully, after some discussion, the implementation in Chrome was updated, and the COI requirement was dropped.

However, there are still major challenges when you utilize third-party scripts. In order to not leak private information from third-party scripts, they are treated as opaque unless they opt-in to CORS. This is primarily to ensure their call stacks aren’t unintentionally leaked, which may include private information. Any cross-origin JavaScript that is in a call-stack will have its entire frame removed unless it has a CORS header.

This is analogous to the protections that cross-origin scripts have in JavaScript error events, where detailed information (line/column number) is only available if the script is same-origin or CORS-enabled.

When applied to Sampled Profiles, this has some strange side-effects.

For any cross-origin script (that is not opt-in to CORS) that has a frame in a sample, its entire frame will be removed, without any indication that this has been done. As a result, this means that some of the stacks may be misleading or confusing.

Consider a case where your same-origin JavaScript calls into one or more cross-origin function:

sampled profiler with cross-origin content

Guess what the profiler will report?

  • sameOriginFunction() 20ms

Even though the two functions crossOriginFunctionA() and crossOriginFunctionB() accounted for a most of the runtime, the JS Self-Profiling API will remove those frames entirely from the report, and limit its reporting to sameOriginFunction().

It’s even stranger if those cross-origin functions call back into same-origin functions. Consider a third-party utility library like jQuery that might do this?

sampled profiler with cross-origin content

The profiler will report:

  • sameOriginFunction() 10ms
  • sameOriginFunction() -> sameOriginCallback() 10ms

In other words, it pretends the cross-origin functions don’t even exist. This could make debugging these types of stacks very confusing!

To ensure your third-party scripts are CORS-enabled, you need to do two things:

  1. The origin serving the third-party JavaScript needs to have the Access-Control-Allow-Origin HTTP response header set
  2. The embedding HTML page needs to set <script src="..." crossorigin="anonymous"></script>

Once these have been set, the third-party JavaScript will be treated the same as any same-origin content and its frame/function/line/column numbers available.

Sending from Unload Events

One challenge with using the JS Self-Profiling API is that to get the trace data, you need to rely on a Promise (callback) from .stop().

As a result, you really can’t use this function in page unload handlers like beforeunload or unload, where promises and callbacks may not get the chance to fire before the DOM is destroyed.

So if you want to use the JS Self-Profiling API, you won’t be able to wait until the page is being unloaded to send your profiles. If you want to profile a session for a long time, you would need to consider breaking up the profiles into multiple pieces and beacon at a regular interval to ensure you received most (but probably not the final) trace.

This is unfortunate for one scenario, which is page loads that are delayed due to a third-party resource or other heavy site execution. I would expect many consumers of this API to trace from the beginning of the page to the load event. But if the visitor leaves the page before it fully loads (say due to a delayed third-party resource), the unload event will fire before the load event, and there will be no opportunity to get the callback from the Profiler.stop().

I’ve filed an issue to see if there are any better ways of addressing unload scenarios.

Non-JavaScript Browser Work

One of the issues with the current profiler is that non-JavaScript execution isn’t represented in profiles.

As a result, top-level User Agent work like HTML Parsing, CSS Style and Layout Calculation, and Painting will appear as "empty" samples.

Other activity like JavaScript garbage collection (GC) will also be "empty" in samples.

There is a proposal for the User Agent to add optional "markers" for specific samples, if it wants the profiler to know about non-JavaScript work:

enum ProfilerMarker { "script", "gc", "style", "layout", "paint", "other" };

...
"samples" : [
  { "timestamp" : 100, "stackId": 2, "marker": "script" },
  { "timestamp" : 110, "stackId": 2, "marker": "gc" },
  { "timestamp" : 120, "stackId": 3, "marker": "layout" },
  { "timestamp" : 130, "stackId": 2, "marker": "script" },
  { "timestamp" : 140, "stackId": 2, "marker": "script" },
}
...

This is still just a proposal, but if implemented it will provide a lot more context of what the browser is doing in profiles.

Conclusion

The JS Self-Profiling API is still under heavy development, experimentation and testing. There are open issues in the Github repository where work is being tracked, and I would encourage anyone utilizing the API to post feedback there.

We’ve heard feedback from Facebook and Microsoft and others that the API has been useful in identifying and fixing performance issues from customers.

Looking forward to hearing others giving the API a try and their results!

Beaconing In Practice

December 28th, 2020

Table of Contents

Introduction

Lighthouse modified via vecteezy.com

  • Step 1: Gather the data!
  • Step 2: ???
  • Step 3: Profit!

Let’s say you have a website, and you want to find out how long it takes your visitors to see the Largest Contentful Paint on your homepage.

Or, let’s say you want to track how frequently your visitors are clicking a button during the Checkout process.

Or, let’s say you want to use the new Measure Memory API to track JavaScript memory usage over time, because you’re concerned that your Single Page App might have a leak.

Or, let’s say your work on a performance analytics library that automatically captures performance metrics all throughout the Page Load and beyond.

For each of those scenarios, you may end up using one of the many exciting JavaScript APIs or libraries to capture, query, track or observe key metrics.

That’s the easy part!

The hard part is making sure your back-end actually receives that data in a reliable way. If your telemetry hasn’t been received, the experience never happened! What’s worse, you may not even know that you don’t know it happened!

So, I’d argue that Step 2 is just as important as Step 1:

  • Step 1: Gather the data!
  • Step 2: Beacon the data!
  • Step 3: Profit!

This article will look at several strategies for reliably exfiltrating telemetry — aka beaconing. We will cover when and how to send beacons, and gotchas you should watch out for.

This article was written by one of the authors of Boomerang, an open-source RUM performance monitoring library that sends a lot of beacons (1 billion+ a day!). We were taking a look at how and when we send beacons to make sure we’re sending them as optimally as possible, especially to make sure we’re not missing beacons due to listening to the wrong (or too many) events. See our findings in the TL;DR section!

Beacons

Each of the scenarios above cover different ways that websites can collect telemetry. What is telemetry? Wikipedia says:

Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring

Any sort of measurement, whether it’s for performance, marketing or just curiosity, is telemetry data. We generally collect telemetry to improve our websites, our services and our visitor’s experiences.

Your website may have its own internal telemetry that tracks application health, or you may rely on third-party marketing or performance analytics libraries to collect data for you automatically.

An essential part of collecting telemetry is making sure that it is reliably sent (exfiltrated) so you can actually use it (in bulk).

In analytics terms, we often call sending telemetry beaconing, and the HTTPS payload that carries the data the beacon.

Beaconing Stages

Every time you collect some data, you should have a strategy for when you’re going to get that data out of the browser.

This sounds simple, but depending on the type of data you’re tracking, when you send it matters just as much as collecting it.

Let’s look at some common scenarios:

Sending Data at Startup

Sometimes, you just want to log that a thing happened. For example, you can log when a Page Load occurred and maybe include a few extra bits of details, like the URL that was loaded or characteristics of the browser.

As long as you’re not waiting on anything else, in this case, it makes sense to beacon immediately after the analytics code has loaded.

Many marketing analytics scripts, such as Google or Adobe Analytics fall into this bucket. As soon as their JavaScript libraries are loaded, they may immediately send a beacon noting that "this Page Load happened" with supporting details about the Page Load’s dimensions.

// pseudo code
function onStartup() {
    // gather the data
    sendBeacon();
}

Good for:

  • Quick marketing-level analytics
  • Highly reliable

Bad for:

  • Collecting any Page Load performance data
  • Measuring anything that happens after the page has loaded (e.g. user interactions or post-Load content)

Gathering Data through the Page Load

Some websites use Real User Monitoring (RUM) to track the performance of each Page Load. Since you’re waiting for the Page Load to finish, you can’t immediately send a beacon when the JavaScript starts up. Generally, you’ll need to wait for at least the Page Load (onload) event, and possibly longer if you have a Single Page App.

To do so, you would normally register for an onload handler, then send your data immediately after the onload event has finished.

Performance analytics libraries such as boomerang.js or SpeedCurve’s LUX will wait until the Page Load (or SPA Page Load) events before beaconing their data.

// pseudo code
function onStartup() {
    window.addEventListener('load', function(event) {
        // you may want to capture more data now, such as the total Page Load time
        gatherMoreData();

        sendBeacon();
    });

    // you could collect some details now, such as the page URL
    gatherSomeData();
}

Note: You may want to delay your beacon until slightly after onload to ensure your analytics tool doesn’t cause a lot of work at the same time other onload handlers are executing:

// pseudo code
function onStartup() {
    window.addEventListener('load', function(event) {
        // wait a little bit until Page Load activity dies down
        setTimeout(function() {
            // you may want to capture more data now, such as the total Page Load time
            gatherMoreData();

            sendBeacon();
        }, 500);
    });

    // you could collect some details now, such as the page URL
    gatherSomeData();

    // ALSO!  Have an unload strategy
}

Good for:

  • Gathering performance analytics

Bad for:

  • Measuring anything that happens after the page has loaded (e.g. user interactions or post-Load content)
  • Waiting only for the Page Load event means you will miss data from any user that abandons the page prior to Page Load
  • Make sure you have an unload strategy to capture abandons.

Incrementally Gathering Telemetry throughout a Page’s Lifetime

After the page has loaded, there may be user interactions or other periodic changes to the page that you want to track.

For example, you may want to measure how many times a button is clicked, or how long it takes for that button click to result in a UI change.

This type of on-the-fly data collection can often be exfiltrated immediately, especially if you’re tracking events in real-time:

// pseudo code
myButton.addEventListener('click', function(event) {
    sendBeacon();
});

You could also consider batching these types of events and sending the data periodically. This may save a bit of CPU and network activity:

// pseudo code
var dataBuffer = [];
myButton.addEventListener('click', function(event) {
    dataBuffer.push(...);
});

// send every 10 seconds if there's new data
setInterval(function() {
    if (dataBuffer.length) {
        sendBeacon(dataBuffer);
        dataBuffer = [];
    }
}, 10000);

Good for:

  • Real time event tracking

Bad for:

  • If you’re batching data, you should have an unload strategy to ensure it goes out before the user leaves

Gathering Data up to the End of the Page

Some types of metrics are continuous, happening or updating throughout the page’s lifecycle. You don’t necessarily want to send a beacon for every update to those metrics — you just want to know the "final" result.

One simple example of this is when measuring Page View Duration, i.e. how long the user spent reading or viewing the page. Sure, you could send a beacon every minute ("they’ve been viewing for [n] minutes!"), but it’s a lot more efficient to just send the final value ("they were here for 5 minutes!") once, when the user is navigating away.

If you’re interested in Google’s Core Web Vitals metrics, you should probably track Cumulative Layout Shift (CLS) beyond just the Page Load event. If Layout Shifts happen post-page-load, those also affect the user experience. CLS is a score that incrementally updates with each Layout Shift, so you shouldn’t necessarily beacon on each Layout Shift — you just want the final CLS value, after the user leaves the page.

Another example would be for the Measure Memory API, which lets you track memory usage over time. If your Single Page App is alive for 3 hours (over many interactions), you may only want to send one final beacon with how the memory behaved over that lifetime.

For these cases, your best bet is to listen for a page lifecycle indicator like the pagehide event, and send data as the user is navigating away. The specific events you want to listen for are a little complex, so read up on unload strategies later.

// pseudo code
var clsScore = 0;

// don't listen for just pagehide!  see unload strategies section
window.addEventListener('pagehide', function(event) {
    sendBeacon();
});

// Listen for each Layout Shift
var po = new PerformanceObserver(function(list) {
  var entries = list.getEntries();
  for (var i = 0; i < entries.length; i++) {
    if (!entries[i].hadRecentInput) {
      clsScore += entries[i].value;
    }
  }
});

po.observe({type: 'layout-shift', buffered: true});

Good for:

  • Continuous metrics that are updated over time, and you only want the final value

Bad for:

  • Real time metrics — these will be delayed until the user actually navigates away
  • Reliability — you will lose some of this data just because unload events aren’t as reliable, so have an unload strategy

"Whenever"

Sometimes you may want track metrics or events, but you don’t necessarily need to send the data immediately (because it doesn’t need to be Real Time data). In fact, it may be advantageous to delay sending until another beacon has to go out. For example, as a later beacon is flushed, you can tack on additional data as needed.

In this case, you may want to:

  • Send data on the next outgoing beacon, if any
  • Send batched data periodically, if desired
  • Send any un-sent data at the end of the page

To do this, you would use a combination of the strategies above — using queuing/batching and unload beacons.

Good for:

  • Minimizing beacon counts

Bad for:

  • Real-time metrics
  • Reliability — you will lose some of this data just because unload events aren’t as reliable, so have an unload strategy

How Many Beacons?

Depending on the data you’re collecting, and how you’re considering exfiltrating it, you may have the choice to send a single beacon, or multiple beacons. Each has its own advantages and disadvantages, from the client’s (browser’s) perspective, as well as the server’s.

A Single Beacon

A single beacon is the simplest way to send your data. Collect all of your data, and when you’re done, send out a single beacon and stop processing. This is frequently how marketing and performance analytics beacons are implemented, when sending the results of a single Page Load.

Good for:

  • Less processing (CPU) time in the client
  • Less network egress bytes (less protocol overhead of a single network request vs. multiple requests)
  • Easier on the back-end — all data relating to the user experience is in one beacon payload, so the server doesn’t have to stitch it back together later

Bad for:

  • Real-time metrics, unless you’re sending the beacon early in the Page Load cycle (immediately or at onload).
  • Capturing data after the beacon has been sent

Multiple Beacons

If you’re collecting data at multiple stages throughout the page lifecycle, or due to user interactions, you may want to send that data on multiple beacons.

The main downside to multiple beacons is that it costs more from several perspectives: more JavaScript CPU time building the beacons, more network overhead sending the beacons, more server CPU time processing the beacons.

In addition, depending on how the back-end server infrastructure is setup, you may want to "link" or "stitch" those beacons together. For example, let’s say you’re interested in tracking the Load Time of a Page, as well as the final Cumulative Layout Shift Score. You may send a beacon out at the onload event with the Load Time, but wait until the unload event to send the final CLS Score.

Later, when you’re analyzing the data, you may want to group or compare Page Load times with their final CLS Scores. To do that, you would need to link the beacons together through some sort of GUID, and probably spend time on the back-end joining those beacons together (at your database layer).

An alternative strategy, once the Page Load beacon arrives, is holding it in memory until the final CLS Score arrives, before "stitching" it together on the back-end and sending to the database as a "combined" beacon with all of the data of that Page Load Experience. Doing this would result in additional server complexity, memory usage, and probably less reliability. You’d also need to figure out what happens if one of the partial beacons never arrives (data gets lost in-transit all the time, and sometimes events like unload never fire).

If you’ll never be looking at or comparing the data from those multiple beacons, these concerns may not matter. But if you’re doing more advanced analytics where joining data from multiple beacons would be common, you should weigh the pros and cons of multiple beacons as part of your strategy.

Good for:

  • Real-time capturing/reporting of events, events don’t "wait" for a later beacon to be sent
  • Capturing data beyond a single event, throughout a Page Load lifecycle

Bad for:

  • Generally more processing time on the client (preparing the beacon)
  • Generally more network usage (HTTP protocol overhead, repeated dimensions or IDs to stitch to other beacons)
  • Generally more processing on the server (multiple incoming requests)
  • Harder to keep context of the same user experience together — multiple beacons may need to be "joined" for querying or held in-memory until they all arrive

Mechanisms

Once you’ve figured out when you’d like to send your beacon(s), and how many you’ll send, you need to convince the browser to send it. There’s at least 4 common APIs to send beacons: Image, XMLHttpRequest, sendBeacon() and Fetch API.

Image

The simplest method of beaconing data is by using a HTML Image, commonly called a "pixel". This is generally done via a HTTP GET request by creating a hidden DOM Image, setting its Image.url, and including your beacon data in the query string.

Often, the server will respond with a 204 No Content or a simple/transparent 1×1 pixel image.

var img = new Image();
img.src = 'https://site.com/beacon/?a=1&b=2';

You can’t include any data in the "body" of the Image, as you only have the URL (query string) to work with. This limits you to how much actual data can be sent, depending on both the browser and server configuration.

From the browser’s point of view, most modern browsers support URL lengths of at least 64 KB:

  • Chrome: ~ 100 KB
  • Firefox (3.x): >= 5 MB
  • Firefox (recent): ~ 100 KB
  • Safari 4, 5: >= 5 MB
  • Safari 13: ~ 64 KB
  • Mobile Safari 13: ~ 64 KB
  • Internet Explorer 6, 7: 2083 bytes
  • Internet Explorer 8, 9, 10, 11: >= 5 MB
  • Edge (EdgeHTML 20-44): >= 5 MB
  • Edge (Chromium 79+): ~ 100 KB
  • Opera (Presto <= 12): >= 5 MB
  • Opera (Chromium): ~ 100 KB

Notably small exceptions are Internet Explorer 6 and 7 (… does anyone still care?).

One thing to keep in mind is that serializing data onto the URL is usually inefficient. Strings need to be URI-encoded, which bloats the size of characters due to "percent encoding". Especially if you’re trying to tack on raw JSON, like this:

{"abc":123,"def":"ghi"}

It gets expanded on the URL by 69% to:

%7B%22abc%22:123,%22def%22:%22ghi%22%7D

You may be able to minimize this type of bloat by using compression or things like JSURL.

The browser’s URL limits are just part of the story. Most web servers also have their own max request URL size:

  • Apache: Defaults to 8190 bytes and can be increased via the LimitRequestLine directive
  • TomCat has a default limit of 8 KB, and can be increased up to 64 KB via maxHttpHeaderSize
  • Jetty has a default limit of 8 KB, and can be increased via requestHeaderSize
  • CDNs will have their own URL length limits, which are usually not configurable. Akamai, CloudFront and Fastly all seem to have limits around 8KB.
  • Users may have proxies installed that have their own limits

At the end of the day, it’s safest to limit Image beacon URLs to under 2,000 bytes, if you care about Internet Explorer 6 and 7. If not, you can probably go up to 8,190 bytes unless you’ve specifically configured and tested all of the parts of your CDN and server infrastructure.

I’m not specifically aware of any user proxies with URL limits, but my guess is there are some out there that may have limits around the same sizes (of 2 or 8 KB), so even if your server infrastructure supports longer request URLs, some users may not be able to send requests that long.

Image Beacon Pros:

  • Simplest API
  • Least amount of overhead
  • Largest browser support
  • Will not be rejected or delayed by CORS

Image Beacon Cons:

  • Does not support HTTP POST
  • Does not support any payload other than the URL
  • Does not support more than ~2 KB of data, depending on the browser
  • Not as reliable as sendBeacon()

XMLHttpRequest

Once the XMLHttpRequest (XHR) API was added to browsers, it created a way for developers to use the API to send raw data to any URL, instead of pretending we were fetching Images from everywhere.

XHRs are a lot more flexible than Image beacons. They can use any HTTP method, including POST. They can also include a body payload (of any Content-Type), so we can avoid the URL length concerns of Image beacons.

To avoid the CORS performance penalty of a OPTIONS Pre-Flight, you should make sure your XHR beacon is a simple request: only GET/POST/HEAD, no fancy headers, and a Content-Type of either:

  • application/x-www-form-urlencoded
  • multipart/form-data
  • text/plain

Make sure to review the fallback strategies in case XMLHttpRequest isn’t available, or if it fails.

XHR allows you to send data synchronously or asynchronously. There’s really no reason to send synchronous XHRs these days. Some websites used to send synchronous XHRs on unload to make sure the beacon data was sent prior to the browser closing the page. These days, you should use sendBeacon() instead for even more reliability and better performance.

Here’s an example of using XHR to send a beacon with multiple key-value pairs:

// data to send
var data = {
    a: 1,
    b: 2
};

// open a POST
var xhr = new XMLHttpRequest();
xhr.open('POST', 'https://site.com/beacon/');
xhr.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');

// prepare to send our data as FORM encoded
var params = [];
for (var name in data) {
    if (data.hasOwnProperty(name)) {
        params.push(encodeURIComponent(name) + '=' + encodeURIComponent(data[name]));
    }
}

var paramsJoined = params.join('&');

// send!
xhr.send(paramsJoined);

XMLHttpRequest Beacon Pros:

  • Simple API
  • Supports HTTP POST and other methods
  • Supports a payload in the body of any content type
  • Supports any size payload (up to server limits)

XMLHttpRequest Beacon Cons:

  • May require consideration around CORS to avoid Pre-Flights
  • Not as reliable as sendBeacon()

sendBeacon

The navigator.sendBeacon(url, payload) API provides a mechanism to asynchronously send beacon data more performantly and reliably than using XMLHttpRequest or Image. When using the sendBeacon() API, even if the page is about to unload, the browser will make a best effort attempt to send the data. The request is always a HTTP POST.

sendBeacon() was built for telemetry, analytics and beaconing, and we should use it if available! According to caniuse.com, over 95% of browser marketshare supports sendBeacon() today (the end of 2020).

The API is fairly simple to use on its own, but has a few gotcha’s and limits.

First, the return value of navigator.sendBeacon() should be checked. If it returned true, you’ve successfully handed data off to the browser and you’re good to go! Note this doesn’t mean the data arrived at the server — you’ll never be able to see the server’s response to the beacon with the sendBeacon() API.

The sendBeacon() API will return false if the UA could not queue the request. This generally happens if the payload size has tripped over certain beacon limits that the browser has set for the page. Here’s what the Beacon API spec says about these limits:

The user agent imposes limits on the amount of data that can be sent via this API: this helps ensure that such requests are delivered successfully and with minimal impact on other user and browser activity. If the amount of data to be queued exceeds the user agent limit, this method returns false; a return value of true implies the browser has queued the data for transfer. However, since the actual data transfer happens asynchronously, this method does not provide any information whether the data transfer has succeeded or not.

In practice today, the following limits are observed:

  • Firefox does not appear to impose any limits
  • Chromium-based browsers and Safari have:
    • A payload size limit: this is defined in the Fetch API spec as 64 KB
    • An outstanding-beacon payload limit: if there are other navigator.sendBeacon() requests in progress (from any script), and the sum of their payload sizes is over 64 KB, the limit is breached
  • In Chrome versions earlier than 66, if the total size of previous calls to sendBeacon() was over 64 KB, subsequent calls would fail

Besides these limits, the URL itself could also contain data, and would adhere to the same URL limits seen in the Image beacon section.

If the navigator.sendBeacon() returns false, it means the browser will not be sending the beacon. If so, it’s best to fallback to XMLHttpRequest or Image beacons.

This sample code will check that sendBeacon() exists and works, and if not, fallback to XHR/Image beacons:

function sendData(payload) {
    if (window &&
        window.navigator &&
        typeof window.navigator.sendBeacon === "function" &&
        typeof window.Blob === "function") {

        var blobData = new window.Blob([payload], {
            type: "application/x-www-form-urlencoded"
        });

        try {
            if (window.navigator.sendBeacon('https://site.com/beacon/', blobData)) {
                // sendBeacon was successful!
                return;
            }
        } catch (e) {
            // fallback below
        }
    }

    // Fallback to XHR or Image
    sendXhrOrImageBeacon();
}

Note there are only 3 CORS safelisted Content-Types you can send:

  • application/x-www-form-urlencoded
  • multipart/form-data
  • text/plain

Any other content type will result in a CORS pre-flight for cross-origin requests, which isn’t desired for a beacon that you’re trying to get out reliably. So if you’re wanting to send application/json content to another domain, you may consider encoding it as just text/plain.

sendBeacon Pros:

  • Simple API, but beware of fallbacks
  • Most reliable
  • Should not be rejected or delayed by CORS (using the correct Content-Types)
  • Supports any size payload, though the browser may reject larger sizes (stick to under 64 KB)

sendBeacon Cons:

  • Calling it does not guarantee the API will "accept" the call — you may need to fallback to other metrics
  • Only supports HTTP POST
  • Supports only some Content Types to avoid CORS pre-flight

Fetch API

Similar to using an XMLHttpRequest, the modern fetch() API could be used to send beacons. If you’re already using Fetch in your app, you could use that interchangeably with XMLHttpRequest as a fallback.

In addition, there’s a recent Fetch API option called keepalive: true. This option is likely what sendBeacon() is using under the hoods in most browsers.

This is supported by Chrome 66+, Safari 11+, and is being considered by Firefox.

There are some caveats and limitations around using keepalive so I’d encourage you to review that issue if you’re using the Fetch API.

At this point, I’d suggest using sendBeacon() over the Fetch API.

Fallback Strategies

Not every beaconing method is available in every browser. You’ll want to try to fallback to older methods if sendBeacon() isn’t available:

Generally, use:

  1. sendBeacon() if available (for reliability) and if it returns true
  2. XMLHttpRequest (or Fetch API) if you need to use HTTP POST or have a body payload or if the data is > 2 KB
  3. Image otherwise

Payload

What does your data look like? How big is it?

Ideally, you should minimize the outgoing request size as much as possible to avoid overtaxing your visitor’s network. To do this, you could consider various forms of data minification or compression.

Limits

It would be wise to first look at your expected minimum, median and maximum payload size. This may dictate what kind of beacon you can send, i.e. Image vs XMLHttpRequest vs sendBeacon(), and whether any sort of minification/compression is needed.

Briefly:

  • If your data is under 2 KB, you can use any type of beacon, and probably don’t need to compress it
  • If your data is under 8 KB, you can use any type of beacon, but won’t support IE 6 or 7
  • If your data is under 64 KB, you can use sendBeacon() or XMLHttpRequest, and you may want to consider compressing it
  • If your data is over 64 KB, you can only use XMLHttpRequest, and you may want to consider compressing it

Payload via URL (Query String)

The simplest beacons can include all of their data in the Query String of a URL, i.e.:

https://mysite.com/beacon/?a=1&b=2...

As we saw with the Image beacon section, in practice this is limited to a total URL length of 2 KB (if you support IE 6/7) or 8 KB (unless your server infrastructure supports more).

One complication is that characters outside of the range below will need to be URI-encoded by encodeURIComponent:

A-Z a-z 0-9 - _ . ! ~ * ' ( )

Depending on your data, this could bloat the size of your URL significantly! You may want to consider JSURL or another compression technique to help offset this if you’re sticking to a URL payload.

Payload via Request Body

For XMLHttpRequest and sendBeacon calls, you’ll often specify the bulk of your data in the payload of the beacon (instead of the URL).

Common ways of encoding your beacon data include:

  • multipart/form-data via FormData, which is pretty inefficient for sending multiple small key-value pairs due to the "boundary" and Content-Disposition overhead:

    ------WebKitFormBoundaryeZAm2izbsZ6UAnS8
    Content-Disposition: form-data; name="a"
    
    1
    ------WebKitFormBoundaryeZAm2izbsZ6UAnS8
    Content-Disposition: form-data; name="b"
    
    2
    ------WebKitFormBoundaryeZAm2izbsZ6UAnS8--
  • application/x-www-form-urlencoded (via UrlSearchParams), which suffers from the same percentage encoding bloat as URLs if you have many non-alpha-numeric characters.
  • text/plain with whatever text content you want, if your server knows how to parse it

Any other content type may trigger a CORS pre-flight for cross-origin requests in XMLHttpRequest and sendBeacon.

Compression

You may want to consider reducing the size of your URL or Body payloads, if possible. There are always trade-offs in doing so, as minification/compression generally use CPU (JavaScript) to reduce outgoing byte sizes.

Some common techniques include:

  • Using a data-specific compression technique to reduce or minify data. We have some examples for data compression in Boomerang for ResourceTiming and UserTiming.
  • URL and application/x-www-form-urlencoded body payloads can benefit from being minified by JSURL, which swaps out characters that must be encoded for URL-safe characters.
  • The Compression Streams API could be used to compress large payloads for browsers that support it

Reliability

As described above, there are many different stages of the page lifecycle that you can send data. Often, you’ll want to send data during one of the lifecycle events like onload or unload.

Browsers give us a lot of lifecycle events to listen to, and depending on which of these events you use, you may be more-or-less likely to receive data if you send a beacon then.

Let’s look at some examples, and find a strategy for when to send our beacons, so we can have the best reliability of the data reaching our servers.

Methodology

I recently ran a study on one of my websites, collecting data over a week from a large set (millions+) of Page Loads.

For each of these visitors, I sent multiple beacons: as soon as the page started up, at onload, during unload and several other events.

The goal was to see how reliable beaconing is at each of those events, and to see what combination of events would be the most reliable way to receive beacons.

The percentages below reflect how frequently a beacon arrived if sent during that event, as compared to the "startup" beacon that was sent as soon as the page’s <head> was parsed.

This test was done on a single site so results from other sites will differ.

Page Load (onload) Event

Besides sending a beacon as soon as the page starts up, the most frequent opportunity to send data is the window load event (aka onload).

onload event

When sending data just at onload, beacons arrive only 86.4% of the time (on this site).

This of course varies by browser:

onload event - by browser

A large percentage of those "missing" beacons are due to page abandons, i.e. when the visitor leaves before the onload event has fired.

This abandon rate will vary by site, but for this particular site, nearly 14% of visits would not be tracked if you only listened to onload.

Thus, if your data requires waiting until the onload event, you should also listen to page lifecycle "unload" events, to get the opportunity to send a beacon if the user is leaving the page. See avoiding abandons below.

Delayed Page Load (onload) Event

Sometimes, you may not want to send data immediately at the onload event. It could make sense to wait a little bit.

You could consider waiting a pre-defined amount of time, say 1 or 5 or 10 seconds after onload before sending the beacon.

Alternatively, if you have page components that are delay-loaded until the onload event, you may want to wait until they load to measure them.

Any amount of time you’re waiting beyond the Page Load will decrease beacon rates, unless you’re also listening to unload events (see below).

For example, artificially adding a delay after onload before sending the beacon resulted in a clear drop-off of reliability:

Waiting N seconds after onload to send a beacon

Again, these rates are if you only listen to the onload (and send a beacon N seconds after that) — you’d ideally pair this with avoiding abandons below to make sure you send a beacon if the visitor leaves first.

Unload Events

There are several events that are all related to the page "unloading", such as visibilitychange, pagehide, beforeunload, and unload. They are all used for specific purposes, and not all browsers support each event.

unload and beforeunload are two events that are fired as the page is being unloaded:

  • beforeunload happens first, and gives JavaScript the opportunity to cancel the unload
  • unload happens next, and there is no turning back

While the unload and beforeunload events have been with us since the beginning of the web, they’re not the most reliable events to use for beaconing:

onload event

The unload event is significantly more reliable than the beforeunload event. This discrepancy is primarily due to browser differences:

unload event - by browser
beforeunload event - by browser

Notably, on Safari Mobile, beforeunload is not fired at all (while unload is).

pagehide and visibilitychange are more "modern" events:

  • visibilitychange can happen when a user switches to another tab (so the current tab is not unloading yet). This may not be the time you want to send a beacon, as a change to hidden doesn’t preclude the page coming back to visible later — the user hasn’t navigated away, just gone away (possibly) temporarily. But it’s possibly the last opportunity you’ll have to send data, so it’s a good time to send a beacon if you can.
  • pagehide was introduced as a more reliable "this page is going away" event than the original unload events, which have some caveats and scenarios where they aren’t expected to fire.

Here’s how often beacons sent during those events arrived:

onload event

As seen above, we find pagehide (the modern version of unload) to be slightly more reliable than unload (74.8% vs. 72.2%). visibilitychange (hidden) alone doesn’t send beacons as often, but if combined with pagehide events, we’re up to 82.3% reliability which is superior to the combined 73.4% of beforeunload|unload.

By browser:

pagehide event - by browser
visibilitychange event - by browser

Not coincidentally, listening for these two events pagehide and visibilitychange to save state or to send a beacon is the recommendation from Ilya Grigorik from back in 2015. This is still a great recommendation. However, if you’re sending only a single beacon (and not just saving state), I recommend considering the trade-offs of attempting to beacon earlier in the process.

Below are all of the unload-style events in a single chart. If for some reason you want to listen to all of these events, you gain the most reliability (82.94%):

onload event

Listening to all events gives you 0.64% more reliability (82.94%) than just pagehide/visibilitychange (at 82.3%).

However, there is a major downside to registering for the unload handler: it breaks BFCache in Chrome , Safari and Firefox! BFCache is a browser performance optimization that’s been available in Firefox and Safari for a while, and was recently added to Chrome 86+. The beforeunload handler also breaks BFCache in Firefox.

Depending on your site (or if you’re a third-party analytics provider), you should consider the trade-off of more beacons vs. breaking BFCache when deciding which events to listen for.

Note: Not all browsers support pagehide or visibilitychange, so you’ll want to detect support for those and if not, fallback to listening for unload and beforeunload as well.

Wrapping this all together, here’s my recommendation for listening for unload-style events to get the most reliability:

// pseudo-code

// prefer pagehide to unload events
if ('onpagehide' in self) {
    addEventListener('pagehide', sendBeacon, { capture: true} );
} else {
    // only register beforeunload/unload in browsers that don't support
    // pagehide to avoid breaking bfcache
    addEventListener('unload', sendBeacon, { capture: true} );
    addEventListener('beforeunload', sendBeacon, { capture: true} );
}

// visibilitychange may be your last opportunity to beacon,
// though the user could come back later
addEventListener('visibilitychange', function() {
    if (document.visibilityState === 'hidden') {
        sendBeacon();
    }
}, { capture: true} );

Avoiding Abandons

If your primary beaconing event is the Page Load (onload) event, but you want to also respond to users abandoning the page before the page reaches onload, you’ll want to combine listening for both onload and Unload events.

When the page is abandoned prematurely, the page may not have all of the data you track for "full" navigations. However, there are often useful things you’ll still want to track, such as:

  • That the Page Load happened at all
  • Characteristics of the page, user, browser
  • What "phase" of the Page Load they reached

Combining onload plus the two recommended Unload events pagehide and visibilitychange (hidden) gives you the best possible opportunity for tracking the Page Load:

Avoiding Abandons

By listening to those three events, we see beacons arriving 92.6% of the time.

This rate:

  • Decreases by just 0.6% to 92.0% if you don’t listen for visibilitychange (if you don’t want to beacon if the user might come back after a tab switch)
  • Increases by just 0.2% to 92.8% if you listen for beforeunload (which would break BFCache in Firefox)
  • Does not increase in any meaningful way if you also listened for unload (which breaks BFCache anyway).

By browser:

Avoiding Abandons

Notably Safari and Safari Mobile seem less reliably for measuring, likely due to not firing the pagehide and visibilitychange events as often.

So if your primary use case is just sending out one beacon by the onload (or Unload) event:

// pseudo-code

// prefer pagehide to unload event
if ('onpagehide' in self) {
    addEventListener('pagehide', sendBeacon, { capture: true} );
} else {
    // only register beforeunload/unload in browsers that don't support
    // pagehide to avoid breaking bfcache
    addEventListener('unload', sendBeacon, { capture: true} );
    addEventListener('beforeunload', sendBeacon, { capture: true} );
}

// visibilitychange may be your last opportunity to beacon,
// though the user could come back later
addEventListener('visibilitychange', function() {
    if (document.visibilityState === 'hidden') {
        sendBeacon();
    }
}, { capture: true} );

// send data at load!
addEventListener('load', sendBeacon, { capture: true} );

// track if we've sent this beacon or not
var sentBeacon = false;
function sendBeacon() {
    if (sentBeacon) {
        return;
    }

    // 1. call navigator.sendBeacon or XHR or Image
    // 2. cleanup after yourself, e.g. handlers

    sentBeacon = true;
}

One Beacon Trade-offs

Many analytics scripts prefer to send a single beacon. Taking boomerang as an example, we measure the performance of the user experience up to the Page Load (onload) event, and attempt to send our performance beacon immediately afterwards.

There are some continuous performance metrics, such as Cumulative Layout Shift (CLS) where it may be desirable to continue measuring the metric throughout the page’s lifetime, right up to the unloading of the page. Doing so would track the "full page" CLS score, instead of just the CLS score snapshotted at the onload event.

There’s an inherent trade-off when trying to decide to send a beacon immediately (at onload) instead of waiting until the unload event. Sending earlier is better for reliability, sending later is better for measuring "more" of the user experience.

Through this study we were able to quantify what this trade-off is (at least for the study’s website):

So the "cost" of sending a single beacon at Unload instead of Page Load is about 10% of beacons don’t arrive. Depending on your priorities, that decrease in beacons may be worth measuring for "longer" before you send your data?

One important thing to remember when some beacons don’t arrive is that their characteristics may not be evenly distributed. In other words, those 10% of beacons may be more frequently "good" experiences, or "bad" experiences, or a particular class of devices or browsers. Those missing beacons aren’t a representative sample of the entire class of visitors, and could be hiding some real issues!

Bringing it back to Ilya’s advice about saving app state via the unloading events: this is still suitable if you’re saving app state or sending multiple beacons, but I’d suggest considering the reliability drop-off of not sending the beacon earlier, depending on the data you’re measuring.

Advanced Techniques

If your goal is to capture as many user experiences as possible, there are a few more things you can try.

Persisting Beacon Data in Local Storage

If your goal is to send a single beacon, and you want to wait as long as possible to send it, you may want to only register for Unload events.

Since not beaconing earlier has a trade-off of being less reliable, you could consider temporarily storing your upcoming beacon data into localStorage until you send it.

If your Unload events fire properly and you’re able to send a beacon, great! You can remove that data from localStorage too.

However, if your application starts up and finds orphan beacon data from a previous Page Load, you could send it on that page instead.

This works best if you’re concerned about losing data for users navigating across your site — obviously if a user navigates away to another website, you may never get the opportunity to send data again (unless they come back later).

Service Workers

You could also consider using a ServiceWorkers as a "network buffer" for your beacon data.

If you’re goal is to send a single beacon but want to wait until as late as possible, you can reduce some of the reliability trade-offs by "sending" the data to a ServiceWorker for the domain, and letting it transmit at its leisure.

You could have a communications channel with your ServiceWorker where you keep updating its beacon data throughout the page’s lifetime, and rely on the ServiceWorker to send when it detects the user is no longer on the page

The reason this works is often a ServiceWorker will persist beyond the page’s lifetime, even if the user navigates to another domain entirely. This won’t work if the browser is closed (or crashes), but ServiceWorkers often live a little beyond the page unload.

Using a ServiceWorker would be best suited for first-party beacons (i.e. capturing data on your own site) — most third-party analytics tools would have a hard time convincing a domain to install a ServiceWorker just to improve their beacon reliability.

Misc

Cleanup

After you’ve successfully sent your data, it’s a good opportunity to consider cleaning up after yourself if you don’t anticipate any additional work.

For example, you could:

  • Remove any event listeners, such as click handlers or unload events
  • Discard any shared state (local variables)

You may not need to do this if you’re sending a beacon as the result of an unload event firing, but if you’re sending data earlier in the Page Load process, make sure you JavaScript won’t continue doing work even though it’ll never send a beacon again.

During Prerender or when Hidden?

You should consider whether it makes sense for you to send a beacon if the user hasn’t seen the page yet.

The most likely scenario is when the page is loaded completely hidden. This can happen when a user opens a link into a new (background) tab, or loads a page and tabs/switches away before it loads.

Is this experience something you want to track? Does the experience matter if the user never saw the page? If you do want to send a beacon, do you send it at onload or wait until the page becomes visible first? These are all questions you should consider when capturing telemetry.

In Boomerang for example, we still measure those "Always Hidden" user experiences (where the user never sees the page before onload), and send a beacon right away. However, the beacon is also tagged with a special parameter, so the back-end (like mPulse) can "bucket" those user experiences so they can be excluded (or reviewed independently) from regular Page Loads.

There used to be some user agents that would also implement a "prerender" mode, but that was abandoned a few years ago. There’s a new privacy-focused prerender proposal that may come back at some point that you should consider similar to the "hidden" case above.

The Future

Because of the limitations we mentioned in this article around the trade-offs for a "one beacon" approach versus its reliability, there have been recent discussions around using something like the Reporting API as a better "beacon data queuing mechanism" that would reliably send your beacon data when the user leaves the page.

You can see a presentation from Yoav Weiss from this year’s 2020 W3C WebPerf TPAC event.

This could enable better capturing of continuous metrics (like CLS) via a single beacon sent just at the end of the Page Load in a reliable way.

Hoping the discussion continues!

TL;DR Summary

There are many reason why and when you may want to send beacons, but here are some high level tips:

  • Use navigator.sendBeacon() when possible, but listen to its return codes and fallback to XMLHttpRequest or Image beacons when needed
  • Send your beacon(s) as early as possible to ensure as many can reach your endpoints
  • If you’re waiting for a specific event to send your beacon, like Page Load, make sure you also have an abandonment strategy
  • There are several browser events that happen near the unloading of a page — listen to pagehide and visibilitychange (hidden) (and not unload or beforeunload which can break BFCache)
  • Be aware of your content and look for ways of minimizing payload size via compression or other means if it makes sense

Finally, we started this research by looking into our own beaconing strategy in Boomerang. We’ve found a few key changes we should make:

  • We currently listen for the unload and beforeunload events to try to make sure we capture all abandons/unloads. This is not only unnecessary (it does not meaningfully increase reliability rate), it also breaks BFCache in nearly all modern browsers
  • We do not currently listen for visibilitychange (hidden) to send our beacon, and we should consider it as it would increase our reliability (by 0.6% points)
  • Boomerang generally sends its Page Load beacon right at onload if possible, as we were concerned with losing measurements if we waited later. This study found we’d miss around 10% of all Page Loads if we only sent our beacon during Unload instead. This may be a tradeoff some RUM customers want, so we can add that as an option.