When Third Parties Stop Being Polite… and Start Getting Real

July 23rd, 2018

When Third Parties Stop Being Polite... and Start Getting Real

At Fluent 2018, Charlie Vazac and I gave a talk titled When Third Parties Stop Being Polite… and Start Getting Real. Here’s the abstract:

Would you give the Amazon Prime delivery robot the key to your house, just because it stops by to deliver delicious packages every day? Even if you would, do you still have 100% confidence that it wouldn’t accidentally drag in some mud, let the neighbor in, steal your things, or burn your house down? Worst-case scenarios such as these are what you should be planning for when deciding whether or not to include third-party libraries and services on your website. While most libraries have good intentions, by including them on your site, you have given them complete control over the kingdom. Once on your site, they can provide all of the great services you want—or they can destroy everything you’ve worked so hard to build.It’s prudent to be cautious: we’ve all heard stories about how third-party libraries have caused slowdowns, broken websites, and even led to downtime. But how do you evaluate the actual costs and potential risks of a third-party library so you can balance that against the service it provides? Every library requires nonzero overhead to provide the service it claims. In many cases, the overhead is minimal and justified, but we should quantify it to understand the real cost. In addition, libraries need to be carefully crafted so they can avoid causing additional pain when the stars don’t align and things go wrong.

Nic Jansma and Charles Vazac perform an honest audit of several popular third-party libraries to understand their true cost to your site, exploring loading patterns, SPOF avoidance, JavaScript parsing, long tasks, runtime overhead, polyfill headaches, security and privacy concerns, and more. From how the library is loaded, to the moment it phones home, you’ll see how third-parties can affect the host page and discover best practices you can follow to ensure they do the least potential harm.

With all of the great performance tools available to developers today, we’ve gained a lot of insight into just how much third-party libraries are impacting our websites. Nic and Charles detail tools to help you decide if a library’s risks and unseen costs are worth it. While you may not have the time to perform a deep dive into every third-party library you want to include on your site, you’ll leave with a checklist of the most important best practices third-parties should be following for you to have confidence in them.

You can watch the presentation on YouTube or catch the slides.

ResourceTiming Visibility: Third-Party Scripts, Ads and Page Weight

March 27th, 2018

Table Of Contents

  1. Introduction
  2. How to gather ResourceTiming data
  3. How cross-origins play a role
    3.1 Cross-Origin Resources
    3.2 Cross-Origin Frames
  4. Why does this matter?
    4.1 Third-Party Scripts
    4.2 Ads
    4.3 Page Weight
  5. Real-world data
  6. Workarounds
  7. Summary

1. Introduction

ResourceTiming is a specification developed by the W3C Web Performance working group, with the goal of exposing accurate performance metrics about all of the resources downloaded during the entire user experience, such as images, CSS and JavaScript.

For a detailed background on ResourceTiming, you can see my post on ResourceTiming in Practice.

One important caveat when working with ResourceTiming data is that in most cases, it will not give you the full picture of all of the resources fetched during a page load.

Why is that? Four reasons:

  1. Assets served from a different domain — known as cross-origin resources — have restrictions applied to them (on timing and size data), for privacy and security reasons, unless they have the Timing-Allow-Origin HTTP response header
  2. Each <iframe> on the page keeps track of the resources that it fetched, and the frame.performance API that exposes the ResourceTiming data is blocked in a cross-origin <iframe>
  3. Some asset types (e.g. <video> tags) may be affected by browser bugs where they don’t generate ResourceTiming entries
  4. If the number of resources loaded in a frame exceeds 150, and the page hasn’t called performance.setResourceTimingBufferSize() to increase the buffer size, ResourceTiming data will be limited to the first 150 resources

The combination of these restrictions — some in our control, some out of our control — leads to an unfortunate caveat when analyzing ResourceTiming data: In the wild, you will rarely be able to access the full picture of everything that was fetched on the page.

Given these blind-spots, you may have a harder time answering these important questions:

  • Are there any third-party JavaScript libraries that are significantly affecting my visitor’s page load experience?
  • How many resources are being fetched by my advertisements?
  • What is my Page Weight?

We’ll explore what ResourceTiming Visibility means for your page, and how you can work around these limitations. We will also sample ResourceTiming Visibility data across the web by looking at the Alexa Top 1000, to see how frequently resources get hidden from our view.

But first, some background:

2. How to gather ResourceTiming Data

ResourceTiming is a straightforward API to use: window.performance.getEntriesByType("resource") returns a list of all resources fetched in the current frame:

var resources = window.performance.getEntriesByType("resource");

/* eg:
[
    {
        name: "https://www.foo.com/foo.png",
        entryType: "resource",
        startTime: 566.357000003336,
        duration: 4.275999992387369,
        initiatorType: "img",
        redirectEnd: 0,
        redirectStart: 0,
        fetchStart: 566.357000003336,
        domainLookupStart: 566.357000003336,
        domainLookupEnd: 566.357000003336,
        connectStart: 566.357000003336,
        secureConnectionStart: 0,
        connectEnd: 566.357000003336,
        requestStart: 568.4959999925923,
        responseStart: 569.4220000004862,
        responseEnd: 570.6329999957234,
        transferSize: 10000,
        encodedBodySize: 10000,
        decodedBodySize: 10000,
    }, ...
]
*/

There is a bit of complexity added when you have frames on the site, e.g. from third-party content such as social widgets or advertising.

Since each <iframe> has its own buffer of ResourceTiming entries, you will need to crawl all of the frames on the page (and sub-frames, etc), and join their entries to the main window’s.

This gist shows a naive way of doing this. For a version that deals with all of the complexities of the crawl, such as adjusting resources in each frame to the correct startTime, you should check out Boomerang’s restiming.js plugin.

3. How cross-origins play a role

The main challenge with ResourceTiming data is that cross-origin resources and cross-origin frames will affect the data you have access to.

3.1 Cross-Origin Resources

For each window / frame on your page, a resource will either be considered “same-origin” or “cross-origin”:

  • same-origin resources share the same protocol, hostname and port of the page
  • cross-origin resources have a different protocol, hostname (or subdomain) or port from the page

ResourceTiming data includes all same-origin and cross-origin resources, but there are restrictions applied to cross-origin resources specifically:

  • Detailed timing information will always be 0:
    • redirectStart
    • redirectEnd
    • domainLookupStart
    • domainLookupEnd
    • connectStart
    • connectEnd
    • requestStart
    • responseStart
    • secureConnectionStart
    • That leaves you with just startTime and responseEnd containing timestamps
  • Size information will always be 0:
    • transferSize
    • encodedBodySize
    • decodedBodySize

For cross-origin assets that you control, i.e. images on your own CDN, you can add a special HTTP response header to force the browser to expose this information:

Timing-Allow-Origin: *

Unfortunately, for most websites, you will not be in control of all of third-party (and cross-origin) assets being served to your visitors. Some third party domains have been adding the Timing-Allow-Origin header to their responses, but not all — current usage is around 13%.

3.2 Cross-Origin Frames

In addition, every <iframe> you have on your page will have its own list of ResourceTiming entries. You will be able to crawl any same-origin and anonymous frames to gather their ResourceTiming entries, but cross-origin frames block access to the the contentWindow (and thus the frame.performance) object — so any resources fetched in cross-origin frames will be invisible.

Note that adding Timing-Allow-Origin to a cross-origin <iframe> (HTML) response will give you full access to the ResourceTiming data for that HTML response, but will not allow you to access frame.performance, so all of the <iframe>‘s ResourceTimings remain unreachable.

To recap:

ResourceTiming Visibility Diagram

4. Why does this matter?

Why does this all matter?

When you, a developer, are looking at a browser’s networking panel (such as Chrome’s Network tab), you have full visibility into all of the page’s resources:

Chrome Network Panel

Unfortunately, if you were to use the ResourceTiming API at the same time, you may only be getting a subset of that data.

ResourceTiming isn’t giving you the full picture.

Without the same visibility into the resources being fetched on a page as a developer has using browser developer tools, if we are just looking at ResourceTiming data, we may be misleading ourselves about what is really happening in the wild.

Let’s look at a couple scenarios:

4.1 Third-Party Scripts

ResourceTiming has given us fantastic insight into the resources that our visitors are fetching when visiting our websites. One of the key benefits of ResourceTiming is that it can help you understand what third-party libraries are doing on your page when you’re not looking.

Third-party libraries cover the spectrum of:

  • JavaScript libraries like jQuery, Modernizr, Angular, and marketing/performance analytics libraries
  • CSS libraries such as Bootstrap, animate.css and normalize.css
  • Social widgets like the Facebook, Twitter and Google+ icons
  • Fonts like FontAwesome, Google Fonts
  • Ads (see next section)

Each of the above categories bring in different resources. A JavaScript library may come alone, or it may fetch more resources and send a beacon. A CSS library may trigger downloads of images or fonts. Social widgets will often create frames on the page that load even more scripts, images and CSS. Ads may bring in 100 megabytes of video just because.

All of these third-party resources can be loaded in a multitude of ways as well. Some of may be embedded (bundled) directly into your site; some may be loaded from a CDN (such as cdnjs.com); some libraries may be loaded directly from a service (such a Google Fonts or Akamai mPulse).

The true cost of each library is hard to judge, but ResourceTiming can give us some insight into the content being loaded.

It’s also important to remember that just because a third-party library behaves one way on your development machine (when you have Developer Tools open) doesn’t mean that it’s not going to behave differently for other users, on different networks, with different cookies set, when it detects your Developer Tools aren’t open. It’s always good to keep an eye on the cost of your third-party libraries in the wild, making sure they are behaving as you expect.

So how can we use ResourceTiming data to understand third-party library behavior?

  • We can get detailed timing information: how long does it take to DNS resolve, TCP connect, wait for the first byte and take to download third-party libraries. All of this information can help you judge the cost of a third-party library, especially if it is for a render-blocking critical resource like non-async JavaScripts, CSS or Fonts.
  • We can understand the weight (byte cost) of the third party, and the dependencies it brings in, by looking at the transferSize. We can also see if it’s properly compressed by comparing encodedBodySize to decodedBodySize.

In order to get the detailed timing information and weight of third-party libraries, they need to be either served same-domain (bundled with your other assets), or with a Timing-Allow-Origin HTTP response header. Otherwise, all you get is the overall duration (without DNS, TCP, request and response times), and no size information.

Take the below screenshot as an example. In it, there are third-party (cross-origin) images on vgtstatic.com that have Timing-Allow-Origin set, so we get a detailed breakdown of the how long they took. We can determine Blocking, DNS, TCP, SSL, Request and Response phases of the requests. We see that many of these resources were Blocked (the grey phase) due to connection limits to *.vgtstatic.com. Once TCP (green) was established, the Request (yellow) was quick and the Response (purple) was nearly instant:

ResourceTiming Waterfall with Cross-Origin Resources

Contrast all of this information to a cross-origin request to http://www.google-analytics.com/collect. This resources does not have Timing-Allow-Origin set, so we only see its total Duration (deep blue). For some reason this beacon took 648 milliseconds, and it is impossible for us to know why. Was it delayed (Blocked) by other requests to the same domain? Did google-analytics.com take a long time to first byte? Was it a huge download? Did it redirect from http:// to https://?

Since the URL is cross-origin without Timing-Allow-Origin, we also do not have any byte information about it, so we don’t know how big the response was, or if it was compressed.

The great news is that many of the common third-party domains on the internet are setting the Timing-Allow-Origin header, so you can get full details:

> GET /analytics.js HTTP/1.1
> Host: www.google-analytics.com
>
< HTTP/1.1 200 OK
< Strict-Transport-Security: max-age=10886400; includeSubDomains; preload
< Timing-Allow-Origin: *
< ...

However, there are still some very common scripts that show up without Timing-Allow-Origin (in the Alexa Top 1000), even when they’re setting TAO on other assets:

  • https://connect.facebook.net/en_US/fbevents.js (200 sites)
  • https://sb.scorecardresearch.com/beacon.js (133 sites)
  • https://d31qbv1cthcecs.cloudfront.net/atrk.js (104 sites)
  • https://www.google-analytics.com/plugins/ua/linkid.js (54 sites)

Also remember that any third-party library that loads a cross-origin frame will be effectively hiding all of its cost from ResourceTiming. See the next section for more details.

4.2 Ads

Ads are a special type of third-party that requires some additional attention.

Most advertising on the internet loads rich media such as images or videos. Most often, you’ll include an ad platform on your website by putting a small JavaScript snippet somewhere on your page, like the example below:

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle" style="display:block;width: 120px; height: 600px" data-ad-client="ca-pub-123"></ins>
<script>(adsbygoogle = window.adsbygoogle || []).push({});</script>

Looks innocent enough right?

Once the ad platform’s JavaScript is on your page, it’s going to bootstrap itself and try to fetch the media it wants to display to your visitors. Here’s the general lifecycle of an ad platform:

  • The platform is loaded by a single JavaScript file
  • Once that JavaScript loads, it will often load another library or load a “configuration file” that tells the library where to get the media for the ad
  • Many ad platforms then create an <iframe> on your page to load a HTML file from an advertisor
  • That <iframe> is often in control of a separate third-party advertisor, which can (and will) load whatever content it wants, including additional JavaScript files (for tracking viewing habits), media and CSS.

In the end, the original JavaScript ad library (e.g. adsbygoogle.js at 25 KB) might be responsible for loading a total of 250 KB of content, or more — a 10x increase.

How much of this content is visible from ResourceTiming? It all depends on a few things:

  1. How much content is loaded within your page’s top level window and has Timing-Allow-Origin set?
  2. How much content is loaded within same-origin (anonymous) frames and has Timing-Allow-Origin set?
  3. How much content is loaded from a cross-origin frame?

Let’s take a look at one example Google AdSense ad:

  • In the screenshot below, the green content was loaded into the top-level window. Thankfully, most of Google AdSense’s resources have Timing-Allow-Origin set, so we have complete visibility into how long they took and how many bytes were transferred. These 3 resources accounted for about 26 KB of data.
  • If we crawl into accessible same-origin frames, we’re able to find another 4 resources that accounted for 115 KB of data. Only one of these was missing Timing-Allow-Origin.
  • AdSense then loaded the advertiser’s content in a cross-origin frame. We’re unable to look at any of the resources in the cross-origin frame. There were 11 resources that accounted for 98 KB of data (40.8% of the total)

Google Ads layout

Most advertising platforms’s ads are loaded into cross-origin frames so they are less likely to affect the page itself. Unfortunately, as you’ve seen, this means it’s also easy for advertisers to unintentionally hide their true cost to ResourceTiming.

4.3 Page Weight

Another challenge is that it makes it really hard to measure Page Weight. Page Weight measures the total cost, in bytes, of loading a page and all of its resources.

Remember that the byte fields of ResourceTiming (transferSize, decodedBodySize and encodedBodySize) are hidden if the resource is cross-origin. In addition, any resources fetched from a cross-origin frame will be completely invisible to ResourceTiming.

So in order for ResourceTiming to expose the accurate Page Weight of a page, you need to ensure the following:

  • All of your resources are fetched from the same origin as the page
    • Unless: If any resources are fetched from other origins, they must have Timing-Allow-Origin
  • You must not have any cross-origin frames
    • Unless: If you are in control of the cross-origin frame’s HTML (or you can convince third-parties to add a snippet to their HTML), you might be able to “bubble up” ResourceTiming data from cross-origin frames

Anything less than the above, and you’re not seeing the full Page Weight.

5. Real World data

How many of your resources are fully visible to ResourceTiming? How many are visible, but don’t contain the detailed timing and size information due to cross-origin restrictions? How many are completely hidden from your view?

There’s been previous research and HTTP Archive stats on Timing-Allow-Origin usage, and I wanted to expand on that research by also seeing how significantly the cross-origin <iframe> issue affects visibility.

To test this, I’ve built a small testing tool that runs Chrome headless (via puppetteer).

The tool loads the Alexa Top 1000 websites, monitoring all resources fetched in the page by the Chrome native networking APIs. It then executes a crawl of the ResourceTiming data, going into all of the frames it has access to, and compares what the browser fetched versus what is visible to ResourceTiming.

Resources are put into three visibility buckets:

  • Full: Same-origin resources, or cross-origin resources that have Timing-Allow-Origin set. These have all of the timing and size fields available.
  • Restricted: Cross-origin resources without Timing-Allow-Origin set. These only have startTime and responseEnd, and no size fields (so can’t be used for Page Weight calculations).
  • Missing: Any resource loaded in a cross-origin <iframe>. Completely invisible to ResourceTiming.

Overall

Across the Alexa Top 1000 sites, only 22.3% of resources have Full visibility from ResourceTiming.

A whopping 31.7% of resources are Missing from ResourceTiming, and the other 46% are Restricted, showing no detailed timing or size information:

ResourceTiming Visibility - Overall by Request Count
Overall ResourceTiming Visibilty by Request Count

If you’re interested in measuring overall byte count for Page Weight, even more data is missing from ResourceTiming. Over 50% of all bytes transferred when loading the Alexa Top 1000 were Missing from ResourceTiming. In order to calculate Page Weight, you need Full Visibility, so if you used ResourceTiming to calculate Page Weight, you would only be (on average) including 18% of the actual bytes:

ResourceTiming Visibility - Overall by Byte Count
Overall ResourceTiming Visibilty by Byte Count

By Site

Obviously each site is different, and due to how a site is structured, each site will have a differing degrees of Visibility. You can see that Alexa Top 1000 websites vary across the Visibility spectrum:

ResourceTiming Visibility Missing Percentage by Site

There are some sites with 100% Full Visibility. They’re serving nearly all of the content from their primary domain, or, have Timing-Allow-Origin on all resources, and don’t contain any cross-origin frames. For example, a lot of simple landing pages such as search engines are like this:

  • google.com
  • sogou.com
  • telegram.org

Other sites have little or no Missing resources but are primarily Restricted Visibility. This often happens when you serve your homepage on one domain and all static assets on a secondary domain (CDN) without Timing-Allow-Origin:

  • stackexchange.com
  • chess.com
  • ask.fm
  • steamcommunity.com

Finally, there are some sites with over 50% Missing resources. After reviewing these sites, it appears the majority of this is due to advertising being loaded from cross-origin frames:

  • weather.com
  • time.com
  • wsj.com

By Content

It’s interesting how Visibility commonly differs by the type of content. For example, CSS and Fonts have a good chance of being Full or at least Restricted Visibility, while Tracking Pixels and Videos are often being loaded in cross-origin frames so are completely Missing.

Breaking down the requests by type, we can see that different types of content are available at different rates:

ResourceTiming Visibility by Type
ResourceTiming Visibility by Type

Let’s look at a couple of these types.

HTML Content

For the purposes of this analysis, HTML resources have an initiatorType of iframe or were a URL ending in .htm*. Note that the page itself is not being reported in this data.

From the previous chart, HTML content is:

  • 20% Full Visibility
  • 25% Restricted
  • 53% Missing

Remember that all top-level frames will show up in ResourceTiming, either as Full (same-origin) or Restricted (cross-origin without TAO). Thus, HTML content that is being reported as Missing must be an <iframe> loaded within a cross-origin <iframe>.

For example, https://pagead2.googlesyndication.com/pagead/s/cookie_push.html is often Missing because it’s loaded within container.html (a cross-origin frame):

<iframe src="https://tpc.googlesyndication.com/safeframe/1-0-17/html/container.html">
    <html>
        <iframe src="https://pagead2.googlesyndication.com/pagead/s/cookie_push.html">
    </html>
</iframe>

Here are the top 5 HTML URLs seen across the Alexa Top 1000 that are Missing in ResourceTiming data:

URLCount
https://staticxx.facebook.com/connect/xd_arbiter/r/Ms1VZf1Vg1J.js?version=42221
https://pagead2.googlesyndication.com/pagead/s/cookie_push.html195
https://static.eu.criteo.net/empty.html106
https://platform.twitter.com/jot.html83
https://tpc.googlesyndication.com/sodar/6uQTKQJz.html62

All of these appear to be <iframes> injected for cross-frame communication.

Video Content

Video content is:

  • 2% Full Visibility (0.8% by byte count)
  • 22% Restricted (1.9% by byte count)
  • 75% Missing (97.3% by byte count)

Missing video content appears to be pretty site-specific — there aren’t many shared video URLs across sites (which isn’t too surprising).

What is surprising is how often straight <video> tags don’t often show up in ResourceTiming data. From my experimentation, this appears to be because of when ResourceTiming data surfaces for <video> content.

In the testing tool, I capture ResourceTiming data at the onload event — the point which the page appears ready. In Chrome, Video content can start playing before onload, and it won’t delay the onload event while it loads the full video. So the user might be seeing the first frames of the video, but not the entire content of the video by the onload event.

However, it looks like the ResourceTiming entry isn’t added until after the full video — which may be several megabytes — has been loaded from the network.

So unfortunately Video content is especially vulnerable to being Missing, simply because it hasn’t loaded all frames by the onload event (if that’s when you’re capturing all of the ResourceTiming data).

Also note that some browsers will (currently) never add ResourceTiming entries for <video> tags.

Most Missing Content

Across the Alexa Top 1000, there are some URLs that are frequently being loaded in cross-origin frames.

The top URLs appear to be a mixture of JavaScript, IFRAMEs used for cross-frame communication, and images:

URLCountTotal Bytes
tpc.googlesyndication.com/pagead/js/r20180312/r20110914/client/ext/m_window_focus_non_hydra.js291350946
staticxx.facebook.com/connect/xd_arbiter/r/Ms1VZf1Vg1J.js?version=422213178502
tpc.googlesyndication.com/pagead/js/r20180312/r20110914/activeview/osd_listener.js2195789703
tpc.googlesyndication.com/pagead/js/r20180312/r20110914/abg.js2154850400
pagead2.googlesyndication.com/pagead/s/cookie_push.html195144690
tpc.googlesyndication.com/safeframe/1-0-17/js/ext.js1781036316
pagead2.googlesyndication.com/bg/lSvH2GMDHdWiQ5txKk8DBwe8KHVpOosizyQXSe1BYYE.js178886084
tpc.googlesyndication.com/pagead/js/r20180312/r20110914/client/ext/m_js_controller.js1632114273
googleads.g.doubleclick.net/pagead/id1468246
www.google-analytics.com/analytics.js1261860568
static.eu.criteo.net/empty.html10622684
static.criteo.net/flash/icon/nai_small.png106139814
static.criteo.net/flash/icon/nai_big.png106234154
platform.twitter.com/jot.html836895

Most Restricted Origins

We can group Restricted requests by domain to see opportunities where getting a third-party to add the Timing-Allow-Origin header would have the most impact:

DomainCountTotal Bytes
images-eu.ssl-images-amazon.com71413204306
www.facebook.com65331286
www.google-analytics.com609757088
www.google.com4873487547
connect.facebook.net4376353078
images-na.ssl-images-amazon.com4368001256
tags.tiqcdn.com4023070430
sb.scorecardresearch.com365140232
www.googletagmanager.com2647684613
i.ebayimg.com2605489475
http2.mlstatic.com2533434650
ib.adnxs.com2431313041
cdn.doubleverify.com2394950712
mc.yandex.ru2352108922

Some of these domains are a bit surprising because the companies behind them have been leaders in adding Timing-Allow-Origin to their third-party content such as Google Analytics and the Facebook Like tag. Looking at some of the most popular URLs, it’s clear that they forgot to / haven’t added TAO to a few popular resources that are often loaded cross-origin:

  • https://www.google.com/textinputassistant/tia.png
  • https://www.google-analytics.com/plugins/ua/linkid.js
  • https://www.google-analytics.com/collect
  • https://images-eu.ssl-images-amazon.com/images/G/01/ape/sf/desktop/xyz.html
  • https://www.facebook.com/fr/b.php

Let’s hope Timing-Allow-Origin usage continues to increase!

Buffer Sizes

By default, each frame will only collect up to 150 resources in the PerformanceTimeline. Once full, no new entries will be added.

How often do sites change the default ResourceTiming buffer size for the main frame?

Buffer SizeNumber of Sites
150854
2002
2502
30015
4005
5005
10006
15001
1000003
10000001

85.4% of sites don’t touch the default ResourceTiming buffer size. Many sites double it (to 300), and 1% pick numbers over 1000.

There are four sites that are even setting it over 10,000:

  • messenger.com: 100,000
  • fbcdn.net: 100,000
  • facebook.com: 100,000
  • lenovo.com: 1,000,000

(this isn’t recommended unless you’re actively clearing the buffer)

Should you increase the default buffer size? Obviously the need to do so varies by site.

In our crawl of the Alexa Top 1000, we find that 14.5% of sites exceed 150 resources in the main frame, while only 15% of those sites had called setResourceTimingBufferSize() to ensure all resources were captured.

6. Workarounds

Given the restrictions we have on cross-origin resources and frames, what can we do to increase the visibility of requests on our pages?

6.1 Timing-Allow-Origin

To move Restricted Visibility resources to Full Visibility, you need to ensure all cross-origin resources have the Timing-Allow-Origin header.

This should be straightforward for any content you provide (e.g. on a CDN), but it may take convincing third-parties to also add this HTTP response header.

Most people specify * as the allowed origin list, so any domain can read the timing data. All third party scripts should be doing this!

Timing-Allow-Origin: *

6.2 Ensuring a Proper ResourceTiming Buffer Size

By default, each frame will only collect up to 150 resources in the PerformanceTimeline. Once full, no new entries will be added.

Obviously this limitation will affect any site that loads more than 150 resources in the main frame (or over 150 resources in a <iframe>).

You can change the default buffer size by calling setRessourceTimingBufferSize():

(function(w){
  if (!w ||
    !("performance" in w) ||
    !w.performance ||
    !w.performance.setResourceTimingBufferSize) {
    return;
  }

  w.performance.setResourceTimingBufferSize(500);
})(window);

Alternatively, you can use a PerformanceObserver (in each frame) to ensure you’re not affected by the limit.

6.3 Bubbling ResourceTiming

It’s possible to “leak” ResourceTiming to parent frames by using window.postMessage() to talk between cross-origin frames.

Here’s some sample code that listens for new ResourceTiming entries coming in via a PerformanceObserver, then “bubbles” up the message to its parent frame. The top-level frame can then use these bubbled ResourceTiming entries and merge them with its own list from itself and same-origin frames.

The challenge is obviously convincing all of your third-party / cross-origin frames to use the same bubbling code.

We are considering adding support for this to Boomerang.

6.4 Synthetic Monitoring

In the absence of having perfect Visibility via the above two methods, it makes sense to supplement your Real User Monitoring analytics with synthetic monitoring as well, which will have access to 100% of the resources.

You could also use synthetic tools to understand the average percentage of Missing resources, and use this to mentally adjust any RUM ResourceTiming data.

7. Summary

ResourceTiming is a fantastic API that has given us insight into previously invisible data. With ResourceTiming, we can get a good sense of how long critical resources are taking on real page loads and beyond.

Unfortunately, due to security and privacy restrictions, ResourceTiming data is limited or missing in many real-world site configurations. This makes it really hard to track your third party libraries and ads, or gather an accurate Page Weight,

My hope is that we get more third-party services to add Timing-Allow-Origin to all of the resources being commonly fetched. It’s not clear if there’s anything we can do to get more visibility into cross-origin frames, other than convincing them to always “bubble up” their resources.

For more information on how this data was gathered, you can look at the resourcetiming-visibility Github repo.

I’ve also put the raw data into a BigQuery dataset: wolverine-digital:resourcetiming_visibility.

Thanks to Charlie Vazac and the Boomerang team for feedback on this article.

An Audit of Boomerang’s Performance

December 29th, 2017

Table Of Contents

  1. Introduction
  2. Methodology
  3. Audit
    1. Boomerang Lifecycle
    2. Loader Snippet
    3. mPulse CDN
    4. boomerang.js size
    5. boomerang.js Parse Time
    6. boomerang.js init()
    7. config.json (mPulse)
    8. Work at onload
    9. The Beacon
    10. Work beyond onload
    11. Work at Unload
  4. TL;DR Summary
  5. Why did we write this article?

1. Introduction

Boomerang is an open-source JavaScript library that measures the page load experience of real users, commonly called RUM (Real User Measurement).

Boomerang’s mission is to measure all aspects of the user’s experience, while not affecting the load time of the page or causing a degraded user experience in any noticeable way (avoiding the Observer Effect). It can be loaded in an asynchronous way that will not delay the page load even if boomerang.js is unavailable.

This post will perform an honest audit of the entire “cost” of Boomerang on a page that it is measuring. We will review every aspect of Boomerang’s lifecycle, starting with the loader snippet through sending a beacon, and beyond. At each phase, we will try to quantify any overhead (JavaScript, network, etc) it causes. The goal is to be open and transparent about the work Boomerang has to do when measuring the user’s experience.

While it is impossible for Boomerang to cause zero overhead, Boomerang strives to be as minimal and lightweight as possible. If we find any inefficiencies along the way, we will investigate alternatives to improve the library.

You can skip down to the TL;DR at the end of this article if you just want an executive summary.

Update 2019-12: We’ve made some performance improvements since this article was originally written. You can read about them here and some sections have been updated to note the changes.

boomerang.js

Boomerang is an open-source library that is maintained by the developers at Akamai.

mPulse, the Real User Monitoring product from Akamai, utilizes a customized version of Boomerang for its performance measurements. The differences between the open-source boomerang.js and mPulse’s boomerang.js are mostly in the form of additional plugins mPulse uses to fetch a domain’s configuration and features.

While this audit will be focused on mPulse’s boomerang.js, most of the conclusions we draw can be applied to the open-source boomerang.js as well.

2. Methodology

This audit will be done on mPulse boomerang.js version 1.532.0.

While Boomerang captures performance data for all browsers going back to IE 6 and onward, this audit will primarily be looking at modern browsers: Chrome, Edge, Safari and Firefox. Modern browsers provide superior profiling and debugging tools, and theoretically provide the best performance.

Modern browsers also feature performance APIs such as NavigationTiming, ResourceTiming and PerformanceObserver. Boomerang will utilize these APIs, if available, and it is important to understand the processing required to use them.

Where possible, we will share any performance data we can in older browsers and on slower devices, to help compare best- and worst-case scenarios.

mPulse uses a customized version of the open-source Boomerang project. Specifically, it contains four extra mPulse-specific plugins. The plugins enabled in mPulse’s boomerang.js v1.532.0 are:

  • config-override.js (mPulse-only)
  • page-params.js (mPulse-only)
  • iframe-delay.js
  • auto-xhr.js
  • spa.js
  • angular.js
  • backbone.js
  • ember.js
  • history.js
  • rt.js
  • cross-domain.js (mPulse-only)
  • bw.js
  • navtiming.js
  • restiming.js
  • mobile.js
  • memory.js
  • cache-reload.js
  • md5.js
  • compression.js
  • errors.js
  • third-party-analytics.js
  • usertiming-compression.js
  • usertiming.js
  • mq.js
  • logn.js (mPulse-only)

For more information on any of these plugins, you can read the Boomerang API documentation.

3. Audit

With that, let’s get a brief overview of how Boomerang operates!

3.1 Boomerang Lifecycle

From the moment Boomerang is loaded on a website to when it sends a performance data beacon, the following is a diagram and overview of Boomerang’s lifecycle (as seen by Chrome Developer Tools’ Timeline):

Boomerang Lifecycle

  1. The Boomerang Loader Snippet is placed into the host page. This snippet loads boomerang.js in an asynchronous, non-blocking manner.
  2. The browser fetches boomerang.js from the remote host. In the case of mPulse’s boomerang.js, this is from the Akamai CDN.
  3. Once boomerang.js arrives on the page, the browser must parse and execute the JavaScript.
  4. On startup, Boomerang initializes itself and any bundled plugins.
  5. mPulse’s boomerang.js then fetches config.json from the Akamai CDN.
  6. At some point the page will have loaded. Boomerang will wait until after all of the load event callbacks are complete, then it will gather all of the performance data it can.
  7. Boomerang will ask all of the enabled plugins to add whatever data they want to the beacon.
  8. Once all of the plugins have completed their work, Boomerang will build the beacon and will send it out the most reliable way it can.

Each of these stages may cause work in the browser:

  • JavaScript parsing time
  • JavaScript executing time
  • Network overhead

The rest of this article will break down each of the above phases and we will dive into the ways Boomerang requires JavaScript or network overhead.

3.2 Loader Snippet

(Update 2019-12: The Loader Snippet has been rewritten for modern browsers, see this update for details.)

We recommend loading boomerang.js using the non-blocking script loader pattern. This methodology, developed by Philip Tellis and others, ensures that no matter how long it takes boomerang.js to load, it will not affect the page’s onload event. We recommend mPulse customers use this pattern to load boomerang.js whenever possible; open-source users of boomerang.js can use the same snippet, pointing at boomerang.js hosted on their own CDN.

The non-blocking script loader pattern is currently 47 lines of code and around 890 bytes. Essentially, an <iframe> is injected into the page, and once that IFRAME’s onload event fires, a <script> node is added to the IFRAME to load boomerang.js. Boomerang is then loaded in the IFRAME, but can access the parent window’s DOM to gather performance data when needed.

For proof that the non-blocking script loader pattern does not affect page load, you can look at this test case and these WebPagetest results:

WebPagetest results of a delayed script

In the example above, we’re using the non-blocking script loader pattern, and the JavaScript is being delayed by 5 seconds. WebPagetest shows that the page still loaded in just 0.323s (the blue vertical line) while the delayed JavaScript request took 5.344s to complete.

You can review the current version of the mPulse Loader Snippet if you’d like.

Common Ways of Loading JavaScript

Why not just add a <script> tag to load boomerang.js directly into the page’s DOM? Unfortunately, all of the common ways of adding <script> tags to a page will block the onload if the script loads slowly. These include:

  • <script> tags embedded in the HTML from the server
  • <script> tags added at runtime by JavaScript
  • <script> tags with async or defer

Notably, async and defer remove the <script> tag from the critical path, but will still block the page’s onload event from firing until the script has been loaded.

Boomerang strives to not affect the host page’s performance timings, so we don’t want to affect when onload fires.

Examples of each of the common methods above follow:

<script> tags embedded in the HTML from the server

The oldest way of including JavaScript in a page — directly via a <script src="..."> tag, will definitely block the onload event.

In this WebPagetest result, we see that the 5-second JavaScript delay causes the page’s load time (blue vertical line) to be 5.295 seconds:

WebPagetest loading SCRIPT tag embedded in HTML

Other events such as domInteractive and domContentLoaded are also delayed.

<script> tags added at runtime by JavaScript

Many third-party scripts suggest adding their JavaScript to your page via a snippet similar to this:

<script>
var newScript = document.createElement('script');
var otherScript = document.getElementsByTagName('script')[0];
newScript.async = 1;
newScript.src = "...";
otherScript.parentNode.insertBefore(newScript, otherScript);
</script>

This script creates a <script> node on the fly, then inserts it into the document adjacent to the first <script> node on the page.

Unfortunately, while this helps ensure the domInteractive and domContentLoaded events aren’t delayed by the 5-second JavaScript, the onload event is still delayed. See the WebPagetest result for more details:

WebPagetest loading SCRIPT tag embedded in HTML

<script> tags with async or defer

<script> async and defer attributes change how scripts are loaded. For both, while the browser will not block document parsing, the onload event will still be blocked until the JavaScript has been loaded.

See this WebPagetest result for an example. domInteractive and domContentLoaded events aren’t delayed, but the onload event still takes 5.329s to fire:

WebPagetest loading SCRIPT tag embedded in HTML

Loader Snippet Overhead

Now that we’ve looked why you shouldn’t use any of the standard methods of loading JavaScript to load Boomerang, let’s see how the non-blocking script loader snippet works and what it costs.

The work being done in the snippet is pretty minimal, but this code needs to be run in a <script> tag in the host page, so it will be executed in the critical path of the page.

For most modern browsers and devices, the amount of work caused by the snippet should not noticeably affect the page’s load time. When profiling the snippet, you should see it take only about 2-15 milliseconds of CPU (with an exception for Edge documented below). This amount of JavaScript CPU time should be undetectable on most webpages.

Here’s a breakdown of the CPU time of the non-blocking loader snippet profiled from various devices:

DeviceOSBrowserLoader Snippet CPU time (ms)
PC DesktopWin 10Chrome 627
PC DesktopWin 10Firefox 572
PC DesktopWin 10IE 1012
PC DesktopWin 10IE 1146
PC DesktopWin 10Edge 4166
MacBook Pro (2017)macOS High SierraSafari 112
Galaxy S4Android 4Chrome 5637
Galaxy S8Android 7Chrome 639
iPhone 4iOS 7Safari 719
iPhone 5siOS 11Safari 119
Kindle Fire 7 (2016)Fire OS 5Silk33

(Update 2019-12: The Loader Snippet has been rewritten for modern browsers, and the overhead has been reduced. See this update for details.)

We can see that most modern devices / browsers will execute the non-blocking loader snippet in under 10ms. For some reason, recent versions of IE / Edge on a fast Desktop still seem to take up to 66ms to execute the snippet — we’ve filed a bug against the Edge issue tracker.

On older (slower) devices, the loader snippet takes between 20-40ms to execute.

Let’s take a look at profiles for Chrome and Edge on the PC Desktop.

Chrome Loader Snippet Profile

Here’s an example Chrome 62 (Windows 10) profile of the loader snippet in action.

In this example, the IFRAME doesn’t load anything itself, so we’re just profiling the wrapper loader snippet code:

Chrome Profile

In general, we find that Chrome spends:

  • <1ms parsing the loader snippet
  • ~5-10ms executing the script and creating the IFRAME
Edge Loader Snippet Profile

Here’s an example Edge 41 (Windows 10) profile of the loader snippet:

Edge Profile

We can see that the loader snippet is more expensive in Edge (and IE 11) than any other platform or browser, taking up to 50ms to execute the loader snippet.

In experimenting with the snippet and looking at the profiling, it seems the majority of this time is caused by the document.open()/write()/close() sequence. When removed, the rest of the snippet takes a mere 2-5ms.

We have filed a bug against the Edge team to investigate. We have also filed a bug against ourselves to see if we can update the document in a way that doesn’t cause so much work in Edge.

Getting the Loader Snippet out of the Critical Path

If the overhead of the loader snippet is of a concern, you can also defer loading boomerang.js until after the onload event has fired. We don’t recommend loading boomerang.js that late, if possible — the later it loads, the higher the possibility you will “lose” traffic because boomerang.js doesn’t load before the user navigates away.

We have example code for how to do this in the Boomerang API docs.

Future Loader Snippet Work

We are currently experimenting with other ways of loading Boomerang in an asynchronous, non-blocking manner that does not require the creation of an anonymous IFRAME.

You can follow the progress of this research on the Boomerang Issues page.

Update 2019-12: The Loader Snippet has been rewritten to improve performance in modern browsers. See this update for details.

3.3 mPulse CDN

Now that we’ve convinced the browser to load boomerang.js from the network, how is it actually fetched?

Loading boomerang.js from a CDN is highly recommended. The faster Boomerang arrives on the page, the higher chance it has to actually measure the experience. Boomerang can arrive late on the page and still capture all of the performance metrics it needs (due to the NavigationTiming API), but there are downsides to it arriving later:

  • The longer it takes Boomerang to load, the less likely it’ll arrive on the page before the user navigates away (missing beacons)
  • If the browser does not support NavigationTiming, and if Boomerang loads after the onload event, (and if the loader snippet doesn’t also save the onload timestamp), the Page Load time may not be accurate
  • If you’re using some of the Boomerang plugins such as Errors (JavaScript Error Tracking) that capture data throughout the page’s lifecycle, they will not capture the data until Boomerang is on the page

For all of these reasons, we encourage the Boomerang Loader Snippet to be as early as possible in the <head> of the document and for boomerang.js to be served from a performant CDN.

boomerang.js via the Akamai mPulse CDN

For mPulse customers, boomerang.js is fetched via the Akamai CDN which provides lightning-quick responses from the closest location around the world.

According to our RUM data, the boomerang.js package loads from the Akamai CDN to visitors’ browsers in about 191ms, and is cached by the browser for 50% of page loads.

Note that because of the non-blocking loader snippet, even if Boomerang loads slowly (or not at all), it should have zero effect on the page’s load time.

The URL will generally look something like this:

https://c.go-mpulse.net/boomerang/API-KEY
https://s.go-mpulse.net/boomerang/API-KEY

Fetching boomerang.js from the above location will result in HTTP response headers similar to the following:

Cache-Control: max-age=604800, s-maxage=604800, stale-while-revalidate=60, stale-if-error=3600
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/javascript;charset=UTF-8
Date: [date]
Expires: [date + 7 days]
Timing-Allow-Origin: *
Transfer-Encoding: chunked
Vary: Accept-Encoding

(Note: these domains are not currently HTTP/2 enabled. However, HTTP/2 doesn’t provide a lot of benefit for a third-party service unless it needs to load multiple resources in parallel. In the case of Boomerang, boomerang.js, config.json and the beacon are all independent requests that will never overlap)

The headers that affect performance are Cache-Control, Content-Encoding and Timing-Allow-Origin:

Cache-Control
Cache-Control: max-age=604800, s-maxage=604800, stale-while-revalidate=60, stale-if-error=3600

The Cache-Control header tells the browser to cache boomerang.js for up to 7 days. This helps ensure that subsequent visits to the same site do not need to re-fetch boomerang.js from the network.

For our mPulse customers, when the boomerang version is updated, using a 7-day cache header means it may take up to 7 days for clients to get the new boomerang.js version. We think that a 7-day update time is a smart trade-off for cache-hit-rate vs. upgrade delay.

For open-source Boomerang users, we recommend using a versioned URL with a long expiry (e.g. years).

Using these headers, we generally see about 50% of page loads have boomerang.js in the cache.

Content-Encoding
Content-Encoding: gzip

Content-Encoding means boomerang.js is encoded via gzip, reducing its size over the wire. This reduces the transfer cost of boomerang.js to about 28% of its original size.

The Akamai mPulse CDN also supports Brotli (br) encoding. This reduces the transfer cost of boomerang.js to about 25% of its original size.

Timing-Allow-Origin
Timing-Allow-Origin: *

The Timing-Allow-Origin header is part of the ResourceTiming spec. If not set, cross-origin resources will only expose the startTime and responseEnd timestamps to ResourceTiming.

We add Timing-Allow-Origin to the HTTP response headers on the Akamai CDN so our customers can review the load times of boomerang.js from their visitors.

3.4 boomerang.js size

Update 2019-12: Boomerang’s size has been reduced from a few improvements. See this update for details.

boomerang.js generally comes packaged with multiple plugins. See above for the plugins used in mPulse’s Boomerang.

The combined source input file size (with debug comments, etc) of boomerang.js and the plugins is 630,372 bytes. We currently use UglifyJS2 for minification, which reduces the file size down to 164,057 bytes. Via the Akamai CDN, this file is then gzip compressed to 47,640 bytes, which is what the browser must transfer over the wire.

We are investigating whether other minifiers might shave some extra bytes off. It looks like UglifyJS3 could reduce the minified file size down to about 160,490 bytes (saving 3,567 bytes), though this only saves ~500 bytes after gzip compression. We used YUI Compressor in the past, and have tested other minifiers such as Closure, but UglifyJS has provided the best and most reliable minification for our needs.

As compared to the top 100 most popular third-party scripts, the minified / gzipped Boomerang package is in the 80th percentile for size. For reference, Boomerang is smaller than many popular bundled libraries such as AngularJS (111 kB), Ember.js (111 kB), Knockout (58 kB), React (42 kB), D3 (81 kB), moment.js (59 kB), jQuery UI (64 kB), etc.

That being said, we’d like to find ways to reduce its size even further. Some options we’re considering:

  • Enabling Brotli compression at the CDN. This would reduce the package size to about 41,000 bytes over the wire. (implemented in 2019)
  • Only sending plugins to browsers that support the specific API. For example, not sending the ResourceTiming plugin to browsers that don’t support it.
  • For mPulse customers, having different builds of Boomerang with different features enabled. For customers wanting a smaller package, that are willing to give up support for things like JavaScript Error Tracking, they could use a customized build that is smaller.

For open-source users of Boomerang, you have the ability to build Boomerang precisely for your needs by editing plugins.json. See the documentation for details.

3.5 boomerang.js Parse Time

Once the browser has fetched the boomerang.js payload from the network, it will need to parse (compile) the JavaScript before it can execute it.

It’s a little complicated to reliably measure how long the browser takes to parse the JavaScript. Thankfully there has been some great work from people like Carlos Bueno, Daniel Espeset and Nolan Lawson, which can give us a good cross-browser approximation of parse/compile that matches up well to what browser profiling in developer tools shows us.

Across our devices, here’s the amount of time browsers are spending parsing boomerang.js:

DeviceOSBrowserParse time (ms)
PC DesktopWin 10Chrome 6211
PC DesktopWin 10Firefox 576
PC DesktopWin 10IE 107
PC DesktopWin 10IE 116
PC DesktopWin 10Edge 416
MacBook Pro (2017)macOS High SierraSafari 116
Galaxy S4Android 4Chrome 5647
Galaxy S8Android 7Chrome 6313
iPhone 4iOS 7Safari 742
iPhone 5siOS 11Safari 1112
Kindle Fire 7 (2016)Fire OS 5Silk41

In our sample of modern devices, we see less than 15 milliseconds parsing the boomerang.js JavaScript. Older (lower powered) devices such as the Galaxy S4 and Kindle Fire 7 may take up to 42 milliseconds to parse the JavaScript.

To measure the parse time, we’ve created a test suite that injects the boomerang.js source text into a <script> node on the fly, with random characters appended (to avoid the browser’s cache). This is based on the idea from Nolan Lawson, and matches pretty well to what is seen in Chrome Developer tools:

Chrome boomerang.js JavaScript Parse

Possible Improvements

There’s not a lot we can do to improve the browser’s parse time, and different browsers may do more work during parsing than others.

The primary way for us to reduce parse time is to reduce the overall complexity of the boomerang.js package. Some of the changes we’re considering (which will especially help the slower devices) were discussed in the previous section on boomerang.js’ size.

3.6 boomerang.js init()

After the browser parses the boomerang.js JavaScript, it should execute the bootstrapping code in the boomerang.js package. Boomerang tries to do as little work as possible at this point — it wants to get off the critical path quickly, and defer as much work as it can to after the onload event.

When loaded, the boomerang.js package will do the following:

  • Create the global BOOMR object
  • Create all embedded BOOMR.plugins.* plugin objects
  • Initialize the BOOMR object, which:
    • Registers event handlers for important events such as pageshow, load, beforeunload, etc
    • Looks at all of the plugins underneath BOOMR.plugins.* and runs .init() on all of them

In general, the Boomerang core and each plugin takes a small amount of processing to initialize their data structures and to setup any events that they will later take action on (i.e. gather data at onload).

Below is the measured cost of Boomerang initialization and all of the plugins’ init():

DeviceOSBrowserinit() (ms)
PC DesktopWin 10Chrome 6210
PC DesktopWin 10Firefox 577
PC DesktopWin 10IE 106
PC DesktopWin 10IE 118
PC DesktopWin 10Edge 4111
MacBook Pro (2017)macOS High SierraSafari 113
Galaxy S4Android 4Chrome 5612
Galaxy S8Android 7Chrome 638
iPhone 4iOS 7Safari 710
iPhone 5siOS 11Safari 118
Kindle Fire 7 (2016)Fire OS 5Silk15

Most modern devices are able to do this initialization in under 10 milliseconds. Older devices may take 10-20 milliseconds.

Here’s a sample Chrome Developer Tools visual of the entire initialization process:

Chrome boomerang.js Initialization

In the above trace, the initialization takes about 13 milliseconds. Broken down:

  • General script execution: 3ms
  • Creating the BOOMR object: 1.5ms
  • Creating all of the plugins: 2ms
  • Initializing BOOMR, registering event handlers, etc: 7.5ms
  • Initializing plugins: 2ms

Possible Improvements

In looking at various Chrome and Edge traces of the creation/initialization process, we’ve found a few inefficiencies in our code.

Since initialization is done in the critical path to page load, we want to try to minimize the amount of work done here. If possible, we should delay work to after the onload event, before we’re about to send the beacon.

In addition, the script currently executes all code sequentially. We can considering breaking up this solid block of code into several chunks, executed on a setTimeout(..., 0). This will help reduce the possibility of a Long Task (which can lead to UI delays and visitor frustration). The plugin creation and initialization is all done sequentially right now, and each could probably be executed after a short delay to allow for any other work that needs to run.

We’ve identified the following areas of improvement:

Bootstrapping Summary

So where are we at so far?

  1. The Boomerang Loader Snippet has executed, told the browser to fetch boomerang.js in an asynchronous, non-blocking way.
  2. boomerang.js was fetched from the CDN
  3. The browser has parsed the boomerang.js package
  4. The BOOMR object and all of the plugins are created and initialized

On modern devices, this will take around 10-40ms. On slower devices, this could take 60ms or more.

What’s next? If you’re fetching Boomerang as part of mPulse, Boomerang will initiate a config.json request to load the domain’s configuration. Users of the open-source Boomerang won’t have this step.

3.7 config.json (mPulse)

For mPulse, once boomerang.js has loaded, it fetches a configuration file from the mPulse servers. The URL looks something like:

https://c.go-mpulse.net/boomerang/config.json?...

This file is fetched via a XMLHttpRequest from the mPulse servers (on the Akamai CDN). Browser are pros at sending XHRs – you should not see any work being done to send the XHR.

Since this fetch is a XMLHttpRequest, it is asynchronous and should not block any other part of the page load. config.json might arrive before, or after, onload.

The size of the config.json response will vary by app. A minimal config.json will be around 660 bytes (430 bytes gzipped). Large configurations may be upward of 10 kB (2 kB gzipped). config.json contains information about the domain, an Anti-Cross-Site-Request-Forgery token, page groups, dimensions and more.

On the Akamai CDN, we see a median download time of config.json of 240ms. Again, this is an asynchronous XHR, so it should not block any work in the browser, but config.json is required before a beacon can be sent (due to the Anti-CSRF token), so it’s important that it’s fetched as soon as possible. This takes a bit longer to download than boomerang.js because it cannot be cached at the CDN.

config.json is served with the following HTTP response headers:

Cache-Control: private, max-age=300, stale-while-revalidate=60, stale-if-error=120
Timing-Allow-Origin: *
Content-Encoding: gzip

Note that the Cache-Control header specifies config.json should only be cached by the browser for 5 minutes. This allows our customers to change their domain’s configuration and see the results within 5 minutes.

Once config.json is returned, the browser must parse the JSON. While the time it takes depends on the size of config.json, we were not able to convince any browser to take more than 1 millisecond parsing even the largest config.json response.

Chrome config.json parsing

After the JSON is parsed, Boomerang sends the configuration to each of the plugins again (calling BOOMR.plugins.X.init() a second time). This is required as many plugins are disabled by default, and config.json turns features such as ResourceTiming, SPA instrumentation and others on, if enabled.

DeviceOSBrowserconfig.json init() (ms)
PC DesktopWin 10Chrome 625
PC DesktopWin 10Firefox 577
PC DesktopWin 10IE 105
PC DesktopWin 10IE 115
PC DesktopWin 10Edge 415
MacBook Pro (2017)macOS High SierraSafari 113
Galaxy S4Android 4Chrome 5645
Galaxy S8Android 7Chrome 6310
iPhone 4iOS 7Safari 730
iPhone 5siOS 11Safari 1110
Kindle Fire 7 (2016)Fire OS 5Silk20

Modern devices spend less than 10 milliseconds parsing the config.json response and re-initializing plugins with the data.

Chrome config.json

Possible Improvements

The same investigations we’ll be doing for the first init() case apply here as well:

3.8 Work at onload

Update 2019-12: Boomerang’s work at onload has been reduced in several cases. See this update for details.

By default, for traditional apps, Boomerang will wait until the onload event fires to gather, compress and send performance data about the user’s page load experience via a beacon.

For Single Page Apps, Boomerangs waits until the later of the onload event, or until all critical-path network activity (such as JavaScripts) has loaded. See the Boomerang documentation for more details on how Boomerang measures Single Page Apps.

Once the onload event (or SPA load event) has triggered, Boomerang will queue a setImmediate() or setTimeout(..., 0) call before it continues with data collection. This ensures that the Boomerang onload callback is instantaneous, and that it doesn’t affect the page’s “load time” — which is measured until all of the load event handlers are complete. Since Boomerang’s load event handler is just a setImmediate, it should not affect the loadEventEnd timestamp.

The setImmediate() queues work for the next period after all of the current tasks have completed. Thus, immediately after the rest of the onload handlers have finished, Boomerang will continue with gathering, compressing and beaconing the performance data.

At this point, Boomerang notifies any plugins that are listening for the onload event that it has happened. Each plugin is responsible for gathering, compressing and putting its data onto the Boomerang beacon. Boomerang plugins that add data at onload include:

  • RT adds general page load timestamps (overall page load time and other timers)
  • NavigationTiming gathers all of the NavigationTiming timestamps
  • ResourceTiming adds information about each downloaded resource via ResourceTiming
  • Memory adds statistics about the DOM
  • etc

Some plugins require more work than others, and different pages will be more expensive to process than others. We will sample the work being done during onload on 3 types of pages:

Profiling onload on a Blank Page

This is a page with only the minimal HTML required to load Boomerang. On it, there are 3 resources and the total onload processing time is about 15ms:

Chrome loading a blank page

The majority of the time is spent in 3 plugins: PageParams, RT and ResourceTiming:

  • PageParams handles domain configuration such as page group definitions. The amount of work this plugin does depends on the complexity of the domain configuration.
  • RT captures and calculates all of the timers on the page
  • ResourceTiming compresses all of the resources on the page, and is usually the most “expensive” plugin due to the compression.

Profiling onload on a Blog

This is loading the nicj.net home page. Currently, there are 24 resources and the onload processing time is about 25ms:

Chrome loading a blog

The profile looks largely the same the empty page, with more time being spent in ResourceTiming compressing the resources. The increase of 10ms can be directly attributed to the additional resources.

Profiling a Galaxy S4 (one of the slower devices we’ve been looking at) doesn’t look too differently, taking only 28ms:

Chrome Galaxy S4 loading a blog

Profiling onload on a Retail Home Page

This is a load of the home page of a popular retail website. On it, there were 161 resources and onload processing time took 37ms:

Chrome loading a retail home page

Again, the profile looks largely the same as the previous two pages. On this page, 25ms is spent compressing the resources.

On the Galaxy S4, we can see that work collecting the performance data (primarily, ResourceTiming data) has increased to about 310ms due to the slower processing power:

Chrome Galaxy S4 loading a blank page

We will investigate ways of reducing this work on mobile devices.

Possible Improvements

While the work done for the beacon is outside of the critical path of the page load, we’ve still identified a few areas of improvement:

3.9 The Beacon

After all of the plugins add their data to the beacon, Boomerang will prepare a beacon and send it to the specified cloud server.

Boomerang will send a beacon in one of 3 ways, depending on the beacon size and browser support:

In general, the size of the beacon varies based on:

  • What type of beacon it is (onload, error, unload)
  • What version of Boomerang
  • What plugins / features are enabled (especially whether ResourceTiming is enabled)
  • What the construction of the page is
  • What browser

Here are some sample beacon sizes with 1.532.0, Chrome 63, all plugins enabled:

  • Blank page:
    • Page Load beacon: 1,876 bytes
    • Unload beacon: 1,389 bytes
  • Blog:
    • Page Load beacon: 3,814-4,230 bytes
    • Unload beacon: 1,390 bytes
  • Retail:
    • Page Load beacon: 8,816-12,812 bytes
    • Unload beacon: 1,660 bytes

For the blog and retail site, here’s an approximate breakdown of the beacon’s data by type or plugin:

  • Required information (domain, start time, beacon type, etc): 2.7%
  • Session information: 0.6%
  • Page Dimensions: 0.7%
  • Timers: 2%
  • ResourceTiming: 85%
  • NavigationTiming: 5%
  • Memory: 1.7%
  • Misc: 2.3%

As you can see, the ResourceTiming plugin dominates the beacon size. This is even after the compression reduces it down to less than 15% of the original size.

Boomerang still has to do a little bit of JavaScript work to create the beacon. Once all of the plugins have registered the data they want to send, Boomerang must then prepare the beacon payload for use in the IMG/XHR/sendBeacon beacon:

Chrome loading a blank page

Once the payload has been prepared, Boomerang queues the IMG, XMLHttpRequest or navigator.sendBeacon call.

Browsers are pros at loading images, sending XHRs and beacons: sending the beacon data should barely register on CPU profiles (56 microseconds in the above example). All of the methods of sending the beacon are asynchronous and will not affect the browser’s main thread or user experience.

The beacon itself is delivered quickly to the Akamai mPulse CDN. We immediately return a 0-byte image/gif in a 204 No Content response before processing any beacon data, so the browser immediately gets the response and moves on.

3.10 Work Beyond onload

Besides capturing performance data about the page load process, some plugins may also be active beyond onload. For example:

  • The Errors plugin optionally monitors for JavaScript errors after page load and will send beacons for batches of errors when found
  • The Continuity plugin optionally monitors for user interactions after page load an will send beacons to report on these experiences

Each of these plugins will have its own performance characteristics. We will profile them in a future article.

3.11 Work at Unload

In order to aid in monitoring Session Duration (how long someone was looking at a page), Boomerang will also attempt to send an unload beacon when the page is being navigated away from (or the browser is closed).

Boomerang listens to the beforeunload, pagehide and unload events, and will send a minimal beacon (with the rt.quit= parameter to designate it’s an unload beacon).

Note that not all of these events fire reliably on mobile browsers, and we see only about 30% of page loads send a successful unload beacon.

Here’s a Chrome Developer Tools profile of one unload beacon:

Chrome loading a blank page

In this example, we’re spending about 10ms at beforeunload before sending the beacon.

Possible Improvements

In profiling a few examples of unload beacon, we’ve found some areas of improvement:

4. TL;DR Summary

So at the end of the day, what does Boomerang “cost”?

  • During the page load critical path (loader snippet, boomerang.js parse, creation and initialization), Boomerang will require approximately 10-40ms CPU time for modern devices and 60ms for lower-end devices
  • Outside of the critical path, Boomerang may require 10-40ms CPU to gather and beacon performance data for modern devices and upwards of 300ms for lower-end devices
  • Loading boomerang.js and its plugins will generally take less than 200ms to download (in a non-blocking way) and transfer around 50kB
  • The mPulse beacon will vary by site, but may be between 2-20 kB or more, depending on the enabled features

We’ve identified several areas for improvement. They’re being tracked in the Boomerang Github Issues page.

If you’re concerned about the overhead of Boomerang, you have some options:

  1. If you can’t have Boomerang executing on the critical path to onload, you can delay the loader snippet to execute after the onload event.
  2. Disable or remove plugins
    • For mPulse customers, you should disable any features that you’re not using
    • For open-source users of Boomerang, you can remove plugins you’re not using by modifying the plugins.json prior to build.

5. Why did we write this article?

We wanted to gain a deeper understanding of how much impact Boomerang has on the host page, and to identify any areas we can improve on. Reviewing all aspects of Boomerang’s lifecycle helps put the overhead into perspective, so we can identify areas to focus on.

We’d love to see this kind of analysis from other third-party scripts. Details like these can help websites understand the cost/benefit analysis of adding a third-party script to their site.

Thanks

I’d like to thanks to everyone who has worked on Boomerang and who has helped with gathering data for this article: Philip Tellis, Charles Vazac, Nigel Heron, Andreas Marschke, Paul Calvano, Yoav Weiss, Simon Hearne, Ron Pierce and all of the open-source contributors.

Reliably Measuring Responsiveness in the Wild

July 20th, 2017

Slides

At Fluent 2017, Shubhie Panicker and I talked Reliably Measuring Responsiveness in the Wild. Here’s the abstract:

Responsiveness to user interaction is crucial for users of web apps, and businesses need to be able to measure responsiveness so they can be confident that their users are happy. Unfortunately, users are regularly bogged down by frustrations such as a delayed “time to interactive” during page load, high or variable input latency on critical interaction events (tap, click, scroll, etc.), and janky animations or scrolling. These negative experiences turn away visitors, affecting the bottom line. Sites that include third-party content (ads, social plugins, etc.) are frequently the worst offenders.

The culprit behind all these responsiveness issues are “long tasks,” which monopolize the UI thread for extended periods and block other critical tasks from executing. Developers lack the necessary APIs and tools to measure and gain insight into such problems in the wild and are essentially flying blind trying to figure out what the main offenders are. While developers are able to measure some aspects of responsiveness, it’s often not in a reliable, performant, or “good citizen” way, and it’s near impossible to correctly identify the perpetrators.

Shubhie Panicker and Nic Jansma share new web performance APIs that enable developers to reliably measure responsiveness and correctly identify first- and third-party culprits for bad experiences. Shubhie and Nic dive into real-user measurement (RUM) web performance APIs they have developed: standardized web platform APIs such as Long Tasks as well as JavaScript APIs that build atop platform APIs, such as Time To Interactive. Shubhie and Nic then compare these measurements to business metrics using real-world data and demonstrate how web developers can detect issues and reliably measure responsiveness in the wild—both at page load and postload—and thwart the culprits, showing you how to gather the data you need to hold your third-party scripts accountable.

You can watch the presentation on YouTube.

Measuring Real User Performance in the Browser

October 11th, 2016

Philip Tellis and I gave a tutorial on Measuring Real User Performance in the Browser at Velocity New York 2016. Slides can be found at Slideshare:

measuring-real-user-performance-in-the-browser

In the tutorial, we cover everything you need to know about measuring your visitors’ experience (also known as Real User Monitoring, or RUM):

  • History of Real User Measurement
  • Browser Performance APIs
  • Visual Experience
  • Beaconing
  • Single Page Apps
  • Continuity
  • Nixing Noise

The 3-hour tutorial video is also available on YouTube.