NicJ.net

Compressing ResourceTiming

At SOASTA, we’re building tools and services to help our customers understand and improve the performance of their websites. Our mPulse product utilizes Real User Monitoring to capture data about page-load performance.

For browser-side data collection, mPulse uses Boomerang, which beacons every single page-load experience back to our real time analytics engine. Boomerang utilizes NavigationTiming when possible to relay accurate performance metrics about the page load, such as the timings of DNS, TCP, SSL and the HTTP response.

ResourceTiming is another important feature in modern browsers that gives JavaScript access to performance metrics about the page’s components fetched from the network, such as CSS, JavaScript and images. mPulse will soon be releasing a new feature that lets our customers view the complete waterfall of every visitor’s session, which can be a tremendous help in debugging performance issues.

The challenge with ResourceTiming is that it offers a lot of data if you want to beacon it all back to a server. For each resource, there’s data on:

Here’s an example of performance.getEntriesByType('resource') of a single resource:

{"responseEnd":2436.426999978721,"responseStart":2435.966999968514,
"requestStart":2435.7460000319406,"secureConnectionStart":0,
"connectEnd":2434.203000040725,"connectStart":2434.203000040725,
"domainLookupEnd":2434.203000040725,"domainLookupStart":2434.203000040725,
"fetchStart":2434.203000040725,"redirectEnd":0,"redirectStart":0,
"initiatorType":"internal","duration":2.2239999379962683,
"startTime":2434.203000040725,"entryType":"resource","name":"http://nicj.net/"}

JSON.stringify()‘d, that’s 469 bytes for this one resource.  Multiple that by each resource on your page, and you can quickly see that gathering and beaconing all of this data back to a server will take a lot of bandwidth and storage if you’re tracking this for every single visitor to your site. The HTTP Archive tells us that the average page is composed of 99 HTTP resources, with an average URL length of 85 bytes.

So for a rough estimate you could expect around 45 KB of ResourceTiming data per page load.

The Goal

We wanted to find a way to compress this data before we JSON serialize it and beacon it back to our server.

Philip Tellis, the author of Boomerang, and I have come up with several compression techniques that can reduce the above data to about 15% of it’s original size.

Techniques

Let’s start out with a single resouce, as you get back from window.performance.getEntriesByType("resource"):

{  
  "responseEnd":323.1100000002698,
  "responseStart":300.5000000000000,
  "requestStart":252.68599999981234,
  "secureConnectionStart":0,
  "connectEnd":0,
  "connectStart":0,
  "domainLookupEnd":0,
  "domainLookupStart":0,
  "fetchStart":252.68599999981234,
  "redirectEnd":0,
  "redirectStart":0,
  "duration":71.42400000045745,
  "startTime":252.68599999981234,
  "entryType":"resource",
  "initiatorType":"script",
  "name":"http://foo.com/js/foo.js"
}

Step 1: Drop some attributes

We don’t need:

{  
  "responseEnd":323.1100000002698,
  "responseStart":300.5000000000000,
  "requestStart":252.68599999981234,
  "secureConnectionStart":0,
  "connectEnd":0,
  "connectStart":0,
  "domainLookupEnd":0,
  "domainLookupStart":0,
  "redirectEnd":0,
  "redirectStart":0,
  "startTime":252.68599999981234,
  "initiatorType":"script",
  "name":"http://foo.com/js/foo.js"
}

Step 2: Change into a fixed-size array

Since we know all of the attributes ahead of time, we can change the object into a fixed-sized array. We’ll create a new object where each key is the URL, and its value is a fixed-sized array. We’ll take care of duplicate URLs later:

{ "name": [initiatorType, startTime, redirectStart, redirectEnd,
   domainLookupStart, domainLookupEnd, connectStart, secureConnectionStart, 
   connectEnd, requestStart, responseStart, responseEnd] }

With our data:

{ "http://foo.com/foo.js": ["script", 252.68599999981234, 0, 0
   0, 0, 0, 0, 
   0, 252.68599999981234, 300.5000000000000, 323.1100000002698] }

Step 3: Drop microsecond timings

For our purposes, we don’t need sub-milliscond accuracy, so we can round all timings to the nearest millisecond:

{ "http://foo.com/foo.js": ["script", 252, 0, 0, 0, 0, 0, 0, 0, 252, 300, 323] }

Step 4: Trie

We can now use an optimized Trie to compress the URLs. A Trie is an optimized tree structure where associative array keys are compressed.

Mark Holland and Mike McCall discussed this technique at Velocity this year.

Here’s an example with multiple resources:

{
    "http://": {
        "foo.com/": {
            "js/foo.js": ["script", 252, 0, 0, 0, 0, 0, 0, 0, 252, 300, 323]
            "css/foo.css": ["css", 300, 0, 0, 0, 0, 0, 0, 0, 305, 340, 500]
        },
        "other.com/other.css": [...]
    }
}

Step 5: Offset from startTime

If we offset all of the timestamps from startTime (which they should always be larger than), they may use fewer characters:

{
    "http://": {
        "foo.com/": {
            "js/foo.js": ["script", 252, 0, 0, 0, 0, 0, 0, 0, 0, 48, 71],
            "css/foo.css": ["script", 300, 0, 0, 0, 0, 0, 5, 40, 200]
        },
        "other.com/other.css": [...]
    }
}

Step 6: Reverse the timestamps and drop any trailing 0s

The only two required timestamps in ResourceTiming are startTime and responseEnd. Other timestamps may be zero due to being a Cross-Origin resource, or a timestamp that was “zero” because it didn’t take any time offset from startTime, such as domainLookupStart if DNS was already resolved.

If we re-order the timestamps so that, after startTime, we put them in reverse order, we’re more likely to have the “zero” timestamps at the end of the array.

{ "name": [initiatorType, startTime, responseEnd, responseStart,
   requestStart, connectEnd, secureConnectionStart, connectStart,
   domainLookupEnd, domainLookupStart, redirectEnd, redirectStart] }
{
    "http://": {
        "foo.com/": {
            "js/foo.js": ["script", 252, 71, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0]
            "css/foo.css": ["script", 300, 200, 40, 5, 0, 0, 0, 0, 0, 0, 0, 0]
        }
    }
}

Once we have all of the zero timestamps towards the end of the array, we can drop any repeating trailing zeros. When reading later, missing array values can be interpreted as zero.

{
    "http://": {
        "foo.com/": {
            "js/foo.js": ["script", 252, 71, 48]
            "css/foo.css": ["css", 300, 200, 40]
        }
    }
}

Step 7: Convert initiatorType into a lookup

Using a numeric lookup instead of a string will save some bytes for initiatorType:

var INITIATOR_TYPES = {
    "other": 0,
    "img": 1,
    "link": 2,
    "script": 3,
    "css": 4,
    "xmlhttprequest": 5
};
{
    "http://": {
        "foo.com/": {
            "js/foo.js": [3, 252, 71, 48]
            "css/foo.css": [4, 300, 200, 40]
        }
    }
}

Step 8: Use Base36 for numbers

Base 36 is convenient because it can result in smaller byte-size than Base-10 and has built-in browser support in JavaScript toString(36):

{
    "http://": {
        "foo.com/": {
            "js/foo.js": [3, "70", "1z", "1c"]
            "css/foo.css": [4, "8c", "5k", "14"]
        }
    }
}

Step 9: Compact the array into a string

A JSON string representation of an array (separated by commas) saves a few bytes during serialization. We’ll designate the first byte as the initiatorType:

{
    "http://": {
        "foo.com/": {
            "js/foo.js": "370,1z,1c",
            "css/foo.css": "48c,5k,14"
        }
    }
}

Step 10: Multiple hits

Finally, if there are multiple hits to the same resource, the keys (URLs) in the Trie will conflict with each other.

Let’s fix this by concatenating multiple hits to the same URL via a special character such as pipe | (see foo.js below):

{
    "http://": {
        "foo.com/": {
            "js/foo.js": "370,1z,1c|390,1,2",
            "css/foo.css": "48c,5k,14"
        }
    }
}

Step 11: Gzip or MsgPack

Applying gzip compression or MsgPack can give additional savings during transport and storage.

Results

Overall, the above techniques compress raw JSON.stringify(performance.getEntriesByType('resource')) to about 15% of its original size.

Taking a few sample pages:

How-To

These compression techniques have been added to the latest version of Boomerang.

I’ve also released a small library that does the compression as well as de-compression of the optimized result: resourcetiming-compression.js.

This article also appears on soasta.com.

Share this: