Search Results

Keyword: ‘thr’

NavigationTiming in Practice

May 27th, 2015

Last updated: May 2021

Table Of Contents

  1. Introduction
  2. How was it done before?
    2.1. What’s Wrong With This?
  3. Interlude: DOMHighResTimestamp
    3.1. Why Not the Date Object?
  4. Accessing NavigationTiming Data
    4.1. NavigationTiming Timeline
    4.2. Example Data
    4.3. How to Use
    4.4. NavigationTiming2
    4.5. Service Workers
  5. Using NavigationTiming Data
    5.1 DIY
    5.2 Open-Source
    5.3 Commercial Solutions
  6. Availability
  7. Tips
  8. Browser Bugs
  9. Conclusion
  10. Updates

Introduction

NavigationTiming is a specification developed by the W3C Web Performance working group, with the goal of exposing accurate performance metrics that describe your visitor’s page load experience (via JavaScript).

NavigationTiming (Level 1) is currently a Recommendation, which means that browser vendors are encouraged to implement it, and it has been shipped in all major browsers.

NavigationTiming (Level 2) is a Working Draft and adds additional features like content sizes and other new data. It is still a work-in-progress, but many browsers already support it.

As of May 2021, 97.9% of the world-wide browser market-share supports NavigationTiming (Level 1).

Let’s take a deep-dive into NavigationTiming!

How it was done before?

NavigationTiming exposes performance metrics to JavaScript that were never available in older browsers, such as your page’s network timings and breakdown. Prior to NavigationTiming, you could not measure your page’s DNS, TCP, request or response times because all of those phases occurred before your application (JavaScript) started up, and the browser did not expose them.

Before NavigationTiming was available, you could still estimate some performance metrics, such as how long it took for your page’s static resources to download. To do this, you can hook into the browser’s onload event, which is fired once all of the static resources on your page (such as JavaScript, CSS, IMGs and IFRAMES) have been downloaded.

Here’s sample (though not very accurate) code:

<html><head><script>
var start = new Date().getTime();

function onLoad {
  var pageLoadTime = (new Date().getTime()) - start;
}

body.addEventListener('load', onLoad, false);
</script></head></html>

What’s wrong with this?

First, it only measures the time from when the JavaScript runs to when the last static resource is downloaded.

If that’s all you’re interested in measuring, that’s fine, but there’s a large part of the user’s experience that you’ll be blind to.

Let’s review the main phases that the browser goes through when fetching your HTML:

  1. DNS resolve: Look up the domain name to find what IP address to connect to
  2. TCP connect: Connect to your server on port 80 (HTTP) or 443 (HTTPS) via TCP
  3. Request: Send a HTTP request, with headers and cookies
  4. Response: Wait for the server to start sending the content (back-end time)

It’s only after Phase 4 (Response) is complete that your HTML is parsed and your JavaScript can run.

Phase 1-4 timings will vary depending on the network. One visitor might fetch your content in 100 ms while it might take another user, on a slower connection, 5,000 ms before they see your content. That delay translates into a painful user-experience.

Thus if you’re only monitoring your application from JavaScript in the <HEAD> to the onload (as in the snippet above), you are blind to a large part of the overall experience.

So the primitive approach above has several downsides:

  • It only measures the time from when the JavaScript runs to when the last static resource is downloaded
  • It misses the initial DNS lookup, TCP connection and HTTP request phases
  • Date().getTime() is not reliable

Interlude – DOMHighResTimeStamp

What about #3? Why is Date.getTime() (or Date.now() or +(new Date)) not reliable?

Let’s talk about another modern browser feature, DOMHighResTimeStamp, aka performance.now().

DOMHighResTimeStamp is a new data type for performance interfaces. In JavaScript, it’s typed as a regular number primitive, but anything that exposes a DOMHighResTimeStamp is following several conventions.

Notably, DOMHighResTimeStamp is a monotonically non-decreasing timestamp with an epoch of performance.timeOrigin and sub-millisecond resolution. It is used by several W3C webperf performance specs, and can always be queried via window.performance.now();

Why not just use the Date object?

DOMHighResTimeStamp helps solve three shortcomings of Date. Let’s break its definition down:

  • monotonically non-decreasing means that every time you fetch a DOMHighResTimeStamp, its’ value will always be at least the same as when you accessed it last. It will never decrease.
  • timestamp with an epoch of performance.timeOrigin means it’s value is a timestamp, whose basis (start) is window.performance.timeOrigin. Thus a DOMHighResTimeStamp of 10 means it’s 10 milliseconds after time time given by performance.timeOrigin
  • sub-millisecond resolution means the value has the resolution of at least a millisecond. In practice, DOMHighResTimeStamps will be a number with the milliseconds as whole-numbers and fractions of a millisecond represented after the decimal. For example, 1.5 means 1500 microseconds, while 100.123 means 100 milliseconds and 123 microseconds.

Each of these points addresses a shortcoming of the Date object. First and foremost, monotonically non-decreasing fixes a subtle issue with the Date object that you may not know exists. The problem is that Date simply exposes the value of your end-user’s clock, according to the operating system. While the majority of the time this is OK, the system clock can be influenced by outside events, even in the middle of when your app is running.

For example, when the user changes their clock, or an atomic clock service adjusts it, or daylight-savings kicks in, the system clock may jump forward, or even go backwards!

So imagine you’re performance-profiling your application by keeping track of the start and end timestamps of some event via the Date object. You track the start time… and then your end-users atomic clock kicks in and adjusts the time forward an hour… and now, from JavaScript Date‘s point of view, it seems like your application just took an hour to do a simple task.

This can even lead to problems when doing statistical analysis of your performance data. Imagine if your monitoring tool is taking the mean value of operational times and one of your users’ clocks jumped forward 10 years. That outlier, while "true" from the point of view of Date, will skew the rest of your data significantly.

DOMHighResTimeStamp addresses this issue by guaranteeing it is monotonically non-decreasing. Every time you access performance.now(), you are guaranteed it will be at least equal to, if not greater than, the last time you accessed it.

You should’t mix Date timestamps (which are Unix epoch based, so you get sample times like 1430700428519) with DOMHighResTimeStamps. If the user’s clock changes, and you mix both Date and DOMHighResTimeStamps, the former could be wildly different from the later.

To help enforce this, DOMHighResTimeStamp is not Unix epoch based. Instead, its epoch is window.performance.timeOrigin (more details of which are below). Since it has sub-millisecond resolution, this means that the values that you get from it are the number of milliseconds since the page load started. As a benefit, this makes them easier to read than Date timestamps, since they’re relatively small and you don’t need to do (now - startTime) math to know when something started running.

DOMHighResTimeStamp is available in most modern browsers, including Internet Explorer 10+, Edge, Firefox 15+, Chrome 20+, Safari 8+ and Android 4.4+. If you want to be able to always get timestamps via window.performance.now(), you can use a polyfill. Note these polyfills will be millisecond-resolution timestamps with a epoch of "something" in unsupported browsers, since monotonically non-decreasing can’t be guaranteed and sub-millisecond isn’t available unless the browser supports it.

As a summary:

DateDOMHighResTimeStamp
Accessed viaDate().getTime()performance.now()
Resolutionmillisecondsub-millisecond
StartUnix epochperformance.timeOrigin
Monotonically Non-decreasingNoYes
Affected by user’s clockYesNo
Example14201475246063392.275999998674

Accessing NavigationTiming Data

So, how do you access NavigationTiming data?

The simplest (and now deprecated) method is that all of the performance metrics from NavigationTiming are available underneath the window.performance DOM object. See the NavigationTiming2 section for a more modern way of accessing this data.

NavigationTiming’s metrics are primarily available underneath window.performance.navigation and window.performance.timing. The former provides performance characteristics (such as the type of navigation, or the number of redirects taken to get to the current page) while the latter exposes performance metrics (timestamps).

Here’s the WebIDL (definition) of the Level 1 interfaces (see the NavigationTiming2 section below for details on accessing the new data)

window.performance.navigation:

interface PerformanceNavigation {
  const unsigned short TYPE_NAVIGATE = 0;
  const unsigned short TYPE_RELOAD = 1;
  const unsigned short TYPE_BACK_FORWARD = 2;
  const unsigned short TYPE_RESERVED = 255;
  readonly attribute unsigned short type;
  readonly attribute unsigned short redirectCount;
};

window.performance.timing:

interface PerformanceTiming {
    readonly attribute unsigned long long navigationStart;
    readonly attribute unsigned long long unloadEventStart;
    readonly attribute unsigned long long unloadEventEnd;
    readonly attribute unsigned long long redirectStart;
    readonly attribute unsigned long long redirectEnd;
    readonly attribute unsigned long long fetchStart;
    readonly attribute unsigned long long domainLookupStart;
    readonly attribute unsigned long long domainLookupEnd;
    readonly attribute unsigned long long connectStart;
    readonly attribute unsigned long long connectEnd;
    readonly attribute unsigned long long secureConnectionStart;
    readonly attribute unsigned long long requestStart;
    readonly attribute unsigned long long responseStart;
    readonly attribute unsigned long long responseEnd;
    readonly attribute unsigned long long domLoading;
    readonly attribute unsigned long long domInteractive;
    readonly attribute unsigned long long domContentLoadedEventStart;
    readonly attribute unsigned long long domContentLoadedEventEnd;
    readonly attribute unsigned long long domComplete;
    readonly attribute unsigned long long loadEventStart;
    readonly attribute unsigned long long loadEventEnd;
};

The NavigationTiming Timeline

Each of the timestamps above corresponds with events in the timeline below:

NavigationTiming timeline

Note that each of the timestamps are Unix epoch-based, instead of being performance.timeOrigin-based like DOMHighResTimeStamps. This has been addressed in NavigationTiming2.

The entire process starts at timing.navigationStart (which should be the same as performance.timeOrigin). This is when your end-user started the navigation. They might have clicked on a link, or hit reload in your browser. The navigation.type property tells you what type of page-load it was: a regular navigation (link- or bookmark- click) (TYPE_NAVIGATE = 0), a reload (TYPE_RELOAD = 1), or a back-forward navigation (TYPE_BACK_FORWARD = 2). Each of these types of navigations will have different performance characteristics.

Around this time, the browser will also start to unload the previous page. If the previous page is the same origin (domain) as the current page, the timestamps of that document’s onunload event (start and end) will be filled in as timing.unloadEventStart and timing.unloadEventEnd. If the previous page was on another origin (or there was no previous page), these timestamps will be 0.

Next, in some cases, your site may go through one or more HTTP redirects before it reaches the final destination. navigation.redirectCount gives you an important insight into how many hops it took for your visitor to reach your page. 301 and 302 redirects each take time, so for performance reasons you should reduce the number of redirects to reach your content to 0 or 1. Unfortunately, due to security concerns, you do not have access to the actual URLs that redirected to this page, and it is entirely possibly that a third-party site (not under your control) initiated the redirect. The difference between timing.redirectStart and timing.redirectEnd encompasses all of the redirects. If these values are 0, it means that either there were no redirects, or at least one of the redirects was from a different origin.

fetchStart is the next timestamp, and indicates the timestamp for the start of the fetch of the current page. If there were no redirects when loading the current page, this value should equal navigationStart. Otherwise, it should equal redirectEnd.

Next, the browser goes through the networking phases required to fetch HTML over HTTP. First the domain is resolved (domainLookupStart and domainLookupEnd), then a TCP connection is initiated (connectStart and connectEnd). Once connected, a HTTP request (with headers and cookies) is sent (requestStart). Once data starts coming back from the server, responseStart is filled, and is ended when the last byte from the server is read at responseEnd.

Note that the only phase without an end timestamp is requestEnd, as the browser does not have insight into when the server received the response.

Any of the above phases (DNS, TCP, request or response) might not take any time, such as when DNS was already resolved, a TCP connection is re-used or when content is served from disk. In this case, the timestamps should not be 0, but should reflect the timestamp that the phase started and ended, even if the duration is 0. For example, if fetchStart is at 1000 and a TCP connection is reused, domainLookupStart, domainLookupEnd, connectStart and connectEnd should all be 1000 as well.

secureConnectionStart is an optional timestamp that is only filled in if it the page was loaded over a secure connection. In that case, it represents the time that the SSL/TLS handshake started.

After responseStart, there are several timestamps that represent phases of the DOM’s lifecycle. These are domLoading, domInteractive, domContentLoadedEventStart, domContentLoadedEventEnd and domComplete.

domLoading, domInteractive and domComplete correspond to when the Document’s readyState are set to the corresponding loading, interactive and complete states.

domContentLoadedEventStart and domContentLoadedEventEnd correspond to when the DOMContentLoaded event fires on the document and when it has completed running.

Finally, once the body’s onload event fires, loadEventStart is filled in. Once all of the onload handlers are complete, loadEventEnd is filled in. Note this means if you’re querying window.performance.timing from within the onload event, loadEventEnd will be 0. You could work around this by querying the timestamps from a setTimeout(..., 10) fired from within the onload event, as in the code example below.

Note: There is a bug in some browsers where they are reporting 0 for some timestamps. This is a bug, as all same-origin timestamps should be filled in, but if you’re consuming this data, you may have to adjust for this.

Browser vendors are also free to ad their own additional timestamps to window.performance.timing. Here is the only currently known vendor-prefixed timestamp available:

  • msFirstPaint – Internet Explorer 9+ only, this event corresponds to when the first paint occurred within the document. It makes no guarantee about what content was painted — in fact, the paint could be just the "white out" prior to other content being displayed. Do not rely on this event to determine when the user started seeing actual content.

Example data

Here’s sample data from a page load:

// window.performance.navigation
redirectCount: 0
type: 0

// window.performance.timing
navigationStart: 1432762408327,
unloadEventEnd: 0,
unloadEventStart: 0,
redirectStart: 0,
redirectEnd: 0,
fetchStart: 1432762408648,
connectEnd: 1432762408886,
secureConnectionStart: 1432762408777,
connectStart: 1432762408688,
domainLookupStart: 1432762408660,
domainLookupEnd: 1432762408688,
requestStart: 1432762408886,
responseStart: 1432762409141,
responseEnd: 1432762409229,
domComplete: 1432762411136,
domLoading: 1432762409147,
domInteractive: 1432762410129,
domInteractive: 1432762410129,
domContentLoadedEventStart: 1432762410164,
domContentLoadedEventEnd: 1432762410263,
loadEventEnd: 1432762411140,
loadEventStart: 1432762411136

How to Use

All of the metrics exposed on the window.performance interface are available to your application via JavaScript. Here’s example code for gathering durations of the different phases of the main page load experience:

function onLoad() {
  if ('performance' in window && 'timing' in window.performance) {
    // gather after all other onload handlers have fired
    setTimeout(function() {
      var t = window.performance.timing;
      var ntData = {
        redirect: t.redirectEnd - t.redirectStart,
        dns: t.domainLookupEnd - t.domainLookupStart,
        connect: t.connectEnd - t.connectStart,
        ssl: t.secureConnectionStart ? (t.connectEnd - secureConnectionStart) : 0,
        request: t.responseStart - t.requestStart,
        response: t.responseEnd - t.responseStart,
        dom: t.loadEventStart - t.responseEnd,
        total: t.loadEventEnd - t.navigationStart
      };
    }, 0);
  }
}

NavigationTiming2

Currently a Working Draft, NavigationTiming (Level 2) builds on top of NavigationTiming:

  • Now based on Resource Timing Level 2
  • Support for the Performance Timeline and via a PerformanceObserver
  • Support for High Resolution Time
  • Adds the next hop protocol
  • Adds transfer and content sizes
  • Adds ServerTiming
  • Add ServiceWorker information

The Level 1 interface, window.performance.timing, will not been changed for Level 2. Level 2 features are not being added to that interface, primarily because the timestamps under window.performance.timing are not DOMHighResTimeStamp timestamps (such as 100.123), but Unix-epoch timestamps (e.g. 1420147524606).

Instead, there’s a new navigation type available from the PerformanceTimeline that contains all of the Level 2 data.

Here’s an example of how to get the new NavigationTiming data:

if ('performance' in window &&
    window.performance &&
    typeof window.performance.getEntriesByType === 'function') {
    var ntData = window.performance.getEntriesByType("navigation")[0];
}

Example data:

 {
    "name": "https://website.com/",
    "entryType": "navigation",
    "startTime": 0,
    "duration": 1568.5999999986961,
    "initiatorType": "navigation",
    "nextHopProtocol": "h2",
    "workerStart": 0,
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 3.600000054575503,
    "domainLookupStart": 3.600000054575503,
    "domainLookupEnd": 3.600000054575503,
    "connectStart": 3.600000054575503,
    "connectEnd": 3.600000054575503,
    "secureConnectionStart": 0,
    "requestStart": 9.700000053271651,
    "responseStart": 188.50000004749745,
    "responseEnd": 194.2999999737367,
    "transferSize": 7534,
    "encodedBodySize": 7287,
    "decodedBodySize": 32989,
    "serverTiming": [],
    "unloadEventStart": 194.90000000223517,
    "unloadEventEnd": 195.10000001173466,
    "domInteractive": 423.9999999990687,
    "domContentLoadedEventStart": 423.9999999990687,
    "domContentLoadedEventEnd": 520.9000000031665,
    "domComplete": 1562.900000018999,
    "loadEventStart": 1562.900000018999,
    "loadEventEnd": 1568.5999999986961,
    "type": "navigate",
    "redirectCount": 0
}

As you can see, all of the fields from NavigationTiming Level 1 are there (except domLoading which was removed), but they’re all DOMHighResTimeStamp timestamps now.

In addition, there are new Level 2 fields:

  • nextHopProtocol: ALPN Protocol ID such as http/0.9 http/1.0 http/1.1 h2 hq spdy/3 (ResourceTiming Level 2)
  • workerStart is the time immediately before the active Service Worker received the fetch event, if a ServiceWorker is installed
  • transferSize: Bytes transferred for the HTTP response header and content body
  • decodedBodySize: Size of the body after removing any applied content-codings
  • encodedBodySize: Size of the body after prior to removing any applied content-codings
  • serverTiming: ServerTiming data

Service Workers

While NavigationTiming2 added a timestamp for workerStart, if you have a Service Worker active for your domain, there are some caveats to be aware of:

Using NavigationTiming Data

With access to all of this performance data, you are free to do with it whatever you want. You could analyze it on the client, notifying you when there are problems. You could send 100% of the data to your back-end analytics server for later analysis. Or, you could hook the data into a DIY or commercial RUM solution that does this for you automatically.

Let’s explore all of these options:

DIY

There are many DIY / Open Source solutions out there that gather and analyze data exposed by NavigationTiming.

Here are some DIY ideas for what you can do with NavigationTiming:

  • Gather the performance.timing metrics on your own and alert you if they are over a certain threshold (warning: this could be noisy)
  • Gather the performance.timing metrics on your own and XHR every page-load’s metrics to your backend for analysis
  • Watch for any pages that resulted in one or more redirects via performance.navigation.redirectCount
  • Determine what percent of users go back-and-forth on your site via performance.navigation.type
  • Accurately monitor your app’s bootstrap time that runs in the body’s onload event via (loadEventEnd - loadEventStart)
  • Monitor the performance of your DNS servers
  • Measure DOM event timestamps without adding event listeners

Open-Source

There are some great projects out there that consume NavigationTiming information.

Boomerang, an open-source library developed by Philip Tellis, had a method for tracking performance metrics before NavigationTiming was supported in modern browsers. Today, it incorporates NavigationTiming data if available. It does all of the hard work of gathering various performance metrics, and lets you beacon (send) the data to a server of your choosing. (I am a contributor to the project).

To compliment Boomerang, there are a couple open-source servers that receive Boomerang data, such as Boomcatch and BoomerangExpress. In both cases, you’ll still be left to analyze the data on your own:

BoomerangExpress

To view NavigationTiming data for any site you visit, you can use this kaaes bookmarklet:

kaaes bookmarklet

SiteSpeed.io helps you track your site’s performance metrics and scores (such as PageSpeed and YSlow):

SiteSpeed.io

Finally, if you’re already using Piwik, there’s a plugin that gathers NavigationTiming data from your visitors:

"generation time" = responseEnd - requestStart

Piwik

Commercial Solutions

If you don’t want to build or manage a DIY / Open-Source solution to gather RUM metrics, there are many great commercial services available.

Disclaimer: I work at Akamai, on mPulse and Boomerang

Akamai mPulse, which gathers 100% of your visitor’s performance data:

Akamai mPulse

Google Analytics Site Speed:

Google Analytics Site Speed

New Relic Browser:

New Relic Browser

NeuStar WPM:

NeuStar WPM

SpeedCurve:

SpeedCurve

There may be others as well — please leave a comment if you have experience using another service.

Availability

NavigationTiming is available in all modern browsers. According to caniuse.com 97.9% of world-wide browser market share supports NavigationTiming, as of May 2021. This includes Internet Explore 9+, Edge, Firefox 7+, Chrome 6+, Opera 15+, Android Browser 4+, Mac Safari 8+ and iOS Safari 9+.

CanIUse NavigationTiming

Tips

Some final tips to re-iterate if you want to use NavigationTiming data:

  • Use fetchStart instead of navigationStart, unless you’re interested in redirects, browser tab initialization time, etc.
  • loadEventEnd will be 0 until after the body’s onload event has finished (so you can’t measure it in the load event itself).
  • We don’t have an accurate way to measure the "request time", as requestEnd is invisible to us (the server sees it).
  • secureConnectionStart isn’t available in Internet Explorer, and will be 0 in other browsers unless on a HTTPS link.
  • If your site is the home-page for a user, you may see some 0 timestamps. Timestamps up through the responseEnd event may be 0 duration because some browsers speculatively pre-fetch home pages (and don’t report the correct timings).
  • If you’re going to be beaconing data to your back-end for analysis, if possible, send the data immediately after the body’s onload event versus waiting for onbeforeunload. onbeforeunload isn’t 100% reliable, and may not fire in some browsers (such as iOS Safari).
  • Single-Page Apps: You’ll need a different solution for "soft" or "in-page" navigations (Boomerang has SPA support).

Browser Bugs

NavigationTiming data may not be perfect, and in some cases, incorrect due to browser bugs. Make sure to validate your data before you use it.

We’ve seen the following problems in the wild:

  • Safari 8/9: requestStart and responseStart might be less than navigationStart and fetchStart
  • Safari 8/9 and Chrome (as recent as 56): requestStart and responseStart might be less than fetchStart, connect* and domainLookup*
  • Chrome (as recent as 56): requestStart is equal to navigationStart but less than fetchStart, connect* and domainLookup*
  • Firefox: Reporting 0 for timestamps that should always be filled in, such as domainLookup*, connect* and requestStart.
  • Chrome: Some timestamps are double what they should be (e.g. if "now" is 1524102861420, we see timestamps around 3048205722840, year 2066)
  • Chrome: When the page has redirects, the responseStart is less than redirectEnd and fetchStart
  • Firefox: The NavigationTiming of the iframe (window.frames[0].performance.timing) does not include redirect counts or redirect times, and many other timestamps are 0

If you’re analyzing NavigationTiming data, you should ensure that all timestamps increment according to the timeline. If not, you should probably question all of the timestamps and discard.

Some known bug reports:

Conclusion

NavigationTiming exposes valuable and accurate performance metrics in modern browsers. If you’re interested in measuring and monitoring the performance of your web app, NavigationTiming data is the first place you should look.

Next up: Interested in capturing the same network timings for all of the sub-resources on your page, such as images, JavaScript, and CSS? ResourceTiming is what you want.

Other articles in this series:

More resources:

Updates

  • 2018-04:
    • Updated caniuse.com market share
    • Updated NavigationTiming2 information, usage, fields
    • Added more browser bugs that we’ve found
  • 2021-05:
    • Updated caniuse.com market share
    • Added a Service Workers section
    • Replaced usage of performance.timing.navigationStart as a time origin with performance.timeOrigin
    • Minor grammar updates
    • Added a Table of Contents

Measuring the Performance of Your Web Apps

May 25th, 2015

You know that performance matters, right?

Just a few seconds slower and your site could be turning away thousands (or millions) of visitors. Don’t take my word for it: there are plenty of case studies, articles, findings, presentations, charts and more showing just how important it is to make your site load quickly. Google is even starting to shame-label slow sites. You don’t want to be that guy.

So how do you monitor and measure the performance of your web apps?

The performance of any system can be measured from several different points of view. Let’s take a brief look at three of the most common performance viewpoints for a web app: from the eyes of the developer, the server and the end-user.

This is the beginning of a series of articles that will expand upon the content given during my talk "Make it Fast: Using Modern Brower APIs to Monitor and Improve the Performance of your Web Applications" at CodeMash 2015.

Developer

The developer’s machine is the first line of defense in ensuring your web application is performing as intended. While developing your app, you are probably building, testing and addressing performance issues as you see them.

In addition to simply using your app, there are many tools you can use to measure how it’s performing. Some of my favorites are:

While ensuring everything is performing well on your development machine (which probably has tons of RAM, CPU and a quick connection to your servers) is a good first step, you also need to make sure your app is playing well with other services on your network, such as your web server, database, etc.

Server

Monitoring the server(s) that run your infrastructure (such as web, database, and other back-end services) is critical for a performance monitoring strategy. Many resources and tools have been developed to help engineers monitor what their servers are doing. Performance monitoring at the server level is critical for reliability (ensuring your core services are running) and scalability (ensuring your infrastructure is performing at the level you want).

From each of your servers’ points of view, there are several components that you can monitor to have visibility into how your infrastructure is performing. Some common monitoring and measuring tools are:

By putting these tools together, you can get a pretty good sense of how your overall infrastructure is performing.

End-user

So you’ve developed your app, deployed it to production, and have been monitoring your infrastructure closely to ensure all of your servers are performing smoothly.

Everything should be golden, right? Your end-users are having a fantastical experience and every one of them just loves visiting your site.

… clearly, that’s probably not the case. The majority of your end-users don’t surf the web on $3,000 development machines, using the latest cutting-edge browser on a low-latency link from your datacenter. A lot of your users are probably on a low-end tablet, on a cell network, 2,000 miles away from your datacenter.

The experience you’ve curated while developing your web app on your high-end development machine will probably be the best experience possible. All of your visitors will likely experience something worse, from not-a-noticeable-difference down to can’t-stand-how-slow-it-is-and-will-never-come-back.

Measuring performance from the server and the developer’s perspective is not the full story. In the end, the only thing that really matters is what your visitor sees, and the experience they have.

Just a few years ago, the web development community didn’t have a lot of tools available to monitor the performance from their end-users’ perspectives. Sure, you could capture simple JavaScript timestamps within your code:

var startTime = Date.now();
// do stuff
var elaspedTime = Date.now() - startTime;

You could spread this code throughout your app and listen for browser events such as onload, but simple timestamps don’t give a lot of visibility into the performance of your end-users.

In addition, since this style of timestamp/profiling is just JavaScript, you have zero visibility into the browser’s networking performance and what happened before the browser parsed your HTML and JavaScript.

W3C Webperf Working Group

To solve these issues, in 2010 the W3C (a standards body in charge of developing web standards such as HTML5, CSS, etc.) formed a new working group with the mission of giving developers the ability to assess and understand the performance characteristics of their web apps.

The W3C webperf working group is an organization whose members include Microsoft, Google, Mozilla, Opera, Facebook, Netflix, SOASTA and more. The working group collaboratively develops standards with the following goals:

  • Expose information that was not previously available
  • Give developers the tools they need to make their applications more efficient

  • Little to no overhead
  • Easy to understand APIs

Since it’s inception, the working group has published a number of standards, many of which are available in modern browsers today. Some of these standards are:

adblock-detector.js

March 22nd, 2014

I run advertising on several of my websites, mostly through Google AdSense. My sites are free communities that don’t otherwise sell products, so advertising is the main way I cover operational expenses. AdSense has been a great partner over the years and the ads they serve aren’t too obtrusive.

However, I realize that many people see all advertising as annoying, and some run ad-blockers in their browser to filter out ads. AdBlock Plus and others are becoming more popular every year.

Since advertising is such an important part of my business, I wanted to try to quantify what percentage of ads were being hidden by my visitor’s ad-blockers. I did a bit of testing to determine how to detect if my ads were being blocked, then ran an experiment on two of my sites. The first site, with a travel focus, saw approximately 9.4% of ads being blocked by visitors. The second site, with a gaming focus, had over 26% of ads blocked.  The industry average is around 23%.

While the ad-block rates are fairly high, I’m honestly not upset or surprised by the results. Generally, people that have an ad-blocker installed won’t be the kind of audience that is likely to click on an ad. In fact, I often run an ad-blocker myself. However, knowing which visitors have blocked the ads gives me an important metric to track in my analytics. It also offers me the opportunity to give one last plea to the visitor by subtly asking them to support the site via donations if they visit often.

What I don’t want to do is annoy any visitors that are using ad-blockers with my plea, but I do think there’s an opportunity, if you’re respectful with your request, to gently suggest to the visitor an alternate method of supporting the site. Below are screenshots of what sarna.net looks like if you visit with an ad-blocker installed.

sarna-ad-blocker-full

Zoomed in, you can see I provide the visitor alternate means of supporting the site, as well as a way to disable the message for 100 days if they find it annoying:

sarna-ad-blocker-zoomed

Since this prompt is text-only and a muted color, I feel that it is an unobtrusive, respectful way of reaching out to the visitor.  So far, I haven’t had any complaints about the new prompt — and I’ve had a few donations as well.  A very small percentage click on the “hide this message…” link.

The logic to detect ad-blocking is fairly straightforward, though there are a few caveats when detecting cross-browser.  Other sites might find it useful, so I’ve packaged it up into a new module called adblock-detector.js.  I’ve only tested it in a limited environment (IE, Chrome and Firefox with AdBlock Plus), so I’m looking for help from others that can test other browsers, browser versions, ad-blockers and ad publishers.

You can use adblock-detector.js to collect metrics on your ad-block rate, or to appeal to your visitors as I’m doing.  I provide examples for both in the repository.

Please use the knowledge gained for good (eg. analytics, subtle prompts), not evil (eg. more ads).

If you want a fully-baked solution, I would also recommend PageFair, which can help you track your ad-block rate, and more.

adblock-detector.js is free, open-source, and available on Github

How to deal with a WordPress wp-comments-post.php SPAM attack

May 9th, 2013

This morning I woke up to several website monitoring alarms going off.  My websites were becoming intermittently unavailable due to extremely high server load (>190).  It appears nicj.net had been under a WordPress comment-SPAM attack from thousands of IP addresses overnight.  After a few hours of investigation, configuration changes and cleanup, I think I’ve resolved the issue.  I’m still under attack, but the changes I’ve made have removed all of the comment SPAM and have reduced the server load back to normal.

Below is a chronicle of how I investigated the problem, how I cleaned up the SPAM, and how I’m preventing it from happening again.

Investigation

The first thing I do when website monitoring alarms are going off (I use Pingdom and Cacti) is to log into the server and check its load.  Load is an indicator of how busy your server is.  Anything greater than the number of CPUs on your server is cause for alarm.  My load is usually around 2.0 — when I logged in, it was 196:

[nicjansma@server3 ~]$ uptime
06:09:48 up 104 days, 11:25,  1 user,  load average: 196.32, 167.75, 156.40

Next, I checked top and found that mysqld was likely the cause of the high load because it was using 200-1000% of the CPU:

top - 06:16:45 up 104 days, 11:32, 2 users, load average: 97.69, 162.31, 161.74
Tasks: 597 total, 1 running, 596 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.8%us, 19.1%sy, 0.0%ni, 10.7%id, 66.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 12186928k total, 12069408k used, 117520k free, 5868k buffers
Swap: 4194296k total, 2691868k used, 1502428k free, 3894808k cached

PID   USER  PR NI VIRT RES  SHR  S %CPU  %MEM TIME+ COMMAND
24846 mysql 20 0 26.6g 6.0g 2.6g S 260.6 51.8 18285:17 mysqld

Using SHOW PROCESSLIST in MySQL (via phpMyAdmin), I saw about 100 processes working on the wp_comments table in the nicj.net WordPress database.

I was already starting to guess that I was under some sort of WordPress comment SPAM attack, so I checked out my Apache access_log and found nearly 800,000 POSTS to wp-comments-post.php since yesterday.  They all look a bit like this:

[nicjansma@server3 ~]$ grep POST access_log
36.248.44.7 - - [09/May/2013:06:07:29 -0700] "POST /wp-comments-post.php HTTP/1.1" 302 20 "http://nicj.net/2009/04/01/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"

What’s worse, the SPAMs were coming from over 3,000 unique IP addresses.  Essentially, it was a distributed denial of service (DDoS) attack:

[nicjansma@server3 ~]$ grep POST access_log | awk '{print $1}' | sort | uniq -c | wc -l
3105

NicJ.net was getting hundreds of thousands of POSTS to wp-comments-post.php, which was causing Apache and MySQL to do a whole lot of work checking them against Akismet for SPAM and saving in the WordPress database.  I logged into the WordPress Admin interface, which verified the problem as well:

There are 809,345 comments in your spam queue right now.

Yikes!

Stopping the Attack

First things first, if you’re under an attack like this, the quickest thing you can do to stop the attack is by disabling comments on your WordPress site.  There are a few ways of doing this.

One way is to go into Settings > Discussion > and un-check Allow people to post comments on new articles.

The second way is to rename wp-comments-post.php, which is what spammers use directly to add comments to your blog.  I renamed my file wp-comments-post.php.bak temporarily, so I could change it back later.  In addition, I created a 0-byte placeholder file called wp-comments-post.php so the POSTS will look to the spammers like they succeeded, but the 0-byte file takes up less server resources than a 404 page:

[nicjansma@server3 ~]$ mv wp-comments-post.php wp-comments-post.php.bak && touch wp-comments-post.php

Either of these methods should stop the SPAM attack immediately.  5 minutes after I did this, my server load was back down to ~2.0.

Now that the spammers are essentially POSTing data to your blank wp-comments-post.php file, new comments shouldn’t be appearing in your blog.  While this will reduce the overhead of the SPAM attack, they are still consuming your bandwidth and web server connections with their POSTs.  To stop the spammers from even sending a single packet to your webserver, you can create a small script that automatically drops packets from IPs that are posting several times to wp-comments-post.php.  This is easily done via a simple script like my Autoban Website Spammers via the Apache Access log post.  Change THRESHOLD to something small like 10, and SEARCHTERM to wp-comments-post.php and you will be automatically dropping packets from IPs that try to post more than 10 comments a day.

Cleaning up the Mess

At this point, I still had 800,000+ SPAMs in my WordPress moderation queue.  I feel bad for Akismet, they actually classified them all!

I tried removing the SPAM comments by going to Comments > Spam > Empty Spam, but I think it was too much for Apache to handle and it crashed.  Time to remove them from MySQL instead!

Via phpMyAdmin, I found that not only were there 800,000+ SPAMs in the database, the wp_comments table was over 3.6 GB and the wp_commentmeta was at 8.1 GB!

Here’s how to clean out the wp_comments table from any comments marked as SPAM:

DELETE FROM wp_comments WHERE comment_approved = 'spam';

OPTIMIZE TABLE wp_comments

In addition to the wp_comments table, the wp_commentmeta table has metadata about all of the comments. You can safely remove any comment metadata for comments that are no longer there:

DELETE FROM wp_commentmeta WHERE comment_id NOT IN (SELECT comment_id FROM wp_comments)

OPTIMIZE TABLE wp_commentmeta

For me, this removed 800,000+ rows of wp_comments (bringing it down from 3.6 GB to just 207 KB) and 2,395,512 rows of wp_commentmeta (bringing it down from 8.1 GB to just 136 KB).

Preventing Future Attacks

There are a few preventative measures you can take to stop SPAM attacks like these.

NOTE: Remember to rename your wp-comments-post.php.bak (or turn Comments back on) after you’re happy with the prevention techniques you’re using.

  1. Disable Comments on your blog entirely (Settings > Discussion > Allow people to post comments on new articles.) (probably not desirable for most people)
  2. Turn off Comments for older posts (spammers seem to target older posts that rank higher in search results). Here’s a way to disable comments automatically after 30 days.
  3. Rename wp-comments-post.php to something else, such as my-comments-post.php. Comment spammers often just assume your code is at the wp-comments-post.php URL and won’t check your site’s HTML to verify this is the case. If you rename wp-comments-post.php and change all occurrences of that URL in your theme, your site should continue to work while the spammers hit a bogus URL. You can follow this renaming guide for more details.
  4. Enable a Captcha for your comments so automated bots are less likely to be able to SPAM your blog. I’ve had great success with Are You A Human.
  5. The Autoban Website Spammers via the Apache Access log post describes my method for automatically dropping packets from bad citizen IP addresses.

After all of these changes, my server load is back to normal and I’m not getting any new SPAM comments.  The DDoS is still hitting my server, but their IP addresses are slowly getting packets dropped via my script every 10 minutes.

Hopefully these steps can help others out there.  Good luck! Fighting spammers is a never-ending battle!

UserTiming.js

April 15th, 2013

UserTiming is one of the W3C specs that I helped design while working at Microsoft through the W3C WebPerf working group.  It helps developers measure the performance of their web applications by giving them access to high precision timestamps. It also provides a standardized API that analytics scripts and developer tools can use to display performance metrics.

UserTiming is natively supported in IE 10 and prefixed in Chrome 25+.  I wanted to use the interface for a few of my projects so I created a small polyfill to help patch other browsers that don’t support it natively. Luckily, a JavaScript version of UserTiming can be implemented and be 100% API functional — you just lose some precision and performance vs. native browser support.

So here it is: UserTiming.js

README:

UserTiming.js is a polyfill that adds UserTiming support to browsers that do not natively support it.

UserTiming is accessed via the PerformanceTimeline, and requires window.performance.now() support, so UserTiming.js adds a limited version of these interfaces if the browser does not support them (which is likely the case if the browser does not natively support UserTiming).

As of 2013-04-15, UserTiming is natively supported by the following browsers:

  • IE 10+
  • Chrome 25+ (prefixed)

UserTiming.js has been verified to add UserTiming support to the following browsers:

  • IE 6-9
  • Firefox 3.6+ (previous versions not tested)
  • Safari 4.0.5+ (previous versions not tested)
  • Opera 10.50+ (previous versions not tested)

UserTiming.js will detect native implementations of UserTiming, window.performance.now() and the PerformanceTimeline and will not make any changes if those interfaces already exist.  When a prefixed version is found, it is copied over to the unprefixed name.

UserTiming.js can be found on GitHub and as the npm usertiming module.