adblock-detector.js

March 22nd, 2014

I run advertising on several of my websites, mostly through Google AdSense. My sites are free communities that don’t otherwise sell products, so advertising is the main way I cover operational expenses. AdSense has been a great partner over the years and the ads they serve aren’t too obtrusive.

However, I realize that many people see all advertising as annoying, and some run ad-blockers in their browser to filter out ads. AdBlock Plus and others are becoming more popular every year.

Since advertising is such an important part of my business, I wanted to try to quantify what percentage of ads were being hidden by my visitor’s ad-blockers. I did a bit of testing to determine how to detect if my ads were being blocked, then ran an experiment on two of my sites. The first site, with a travel focus, saw approximately 9.4% of ads being blocked by visitors. The second site, with a gaming focus, had over 26% of ads blocked.  The industry average is around 23%.

While the ad-block rates are fairly high, I’m honestly not upset or surprised by the results. Generally, people that have an ad-blocker installed won’t be the kind of audience that is likely to click on an ad. In fact, I often run an ad-blocker myself. However, knowing which visitors have blocked the ads gives me an important metric to track in my analytics. It also offers me the opportunity to give one last plea to the visitor by subtly asking them to support the site via donations if they visit often.

What I don’t want to do is annoy any visitors that are using ad-blockers with my plea, but I do think there’s an opportunity, if you’re respectful with your request, to gently suggest to the visitor an alternate method of supporting the site. Below are screenshots of what sarna.net looks like if you visit with an ad-blocker installed.

sarna-ad-blocker-full

Zoomed in, you can see I provide the visitor alternate means of supporting the site, as well as a way to disable the message for 100 days if they find it annoying:

sarna-ad-blocker-zoomed

Since this prompt is text-only and a muted color, I feel that it is an unobtrusive, respectful way of reaching out to the visitor.  So far, I haven’t had any complaints about the new prompt — and I’ve had a few donations as well.  A very small percentage click on the “hide this message…” link.

The logic to detect ad-blocking is fairly straightforward, though there are a few caveats when detecting cross-browser.  Other sites might find it useful, so I’ve packaged it up into a new module called adblock-detector.js.  I’ve only tested it in a limited environment (IE, Chrome and Firefox with AdBlock Plus), so I’m looking for help from others that can test other browsers, browser versions, ad-blockers and ad publishers.

You can use adblock-detector.js to collect metrics on your ad-block rate, or to appeal to your visitors as I’m doing.  I provide examples for both in the repository.

Please use the knowledge gained for good (eg. analytics, subtle prompts), not evil (eg. more ads).

If you want a fully-baked solution, I would also recommend PageFair, which can help you track your ad-block rate, and more.

adblock-detector.js is free, open-source, and available on Github

Using Phing for Fun and Profit

February 12th, 2014

I gave a small presentation on using Phing as a PHP build system for GrPhpDev on 2014-02-11. It’s available on SlideShare:

Using Phing for Fun and Profit

The presentation and examples are also available on Github.

ChecksumVerifier – A Windows Command-Line Tool to Verify the Integrity of your Files

February 4th, 2014

Several years ago I wrote a small tool called ChecksumVerifier. It maintains a database of files and their checksums, and helps you verify that the files have not changed. I use it on my external hard drive backups to validate that the files are not being corrupted due to bitrot or other disk corruption.  At the time I created it, there were several other simple and commercial Windows apps that would do the same thing, but nothing was free and command-line based.  I’ve been meaning to clean it up so I could open-source it, which I finally had the time to do last weekend.

A checksum is a small sequence of 20-200 characters (depending on the specific algorithm used) that is calculated by reading the input file and applying a mathematical algorithm to its contents. Even a file as large as 1TB will only have a small 20-200 character checksum, so checksums are an efficient way of saving the file’s state without saving it’s entire contents. ChecksumVerifier uses the MD5, SHA-1, SHA-256 and SHA-512 algorithms, which are generally collision resistant enough for validating the integrity of file contents.

One example usage of ChecksumVerifier is to verify the integrity of external hard drive backups. After saving files to an external disk, you can run ChecksumVerifier -update to calculate the checksums of all of the files on the external disk. At a later date, if you want to validate that the files on the disk have not been added, removed or changed, you can run ChecksumVerifier -verify and it will re-calculate all of the disks’ checksums and compare them to the original database to see if any files have been changed in any way.

ChecksumVerifier is pretty flexible and has several command line options:

Usage: ChecksumVerifier.exe [-update | -verify] -db [xml file] [options]

actions:
     -update:                Update checksum database
     -verify:                Verify checksum database

required:
     -db [xml file]          XML database file

options:
     -match [match]          Files to match (glob pattern such as * or *.jpg or ??.foo) (default: *)
     -exclude [match]        Files to exclude (glob pattern such as * or *.jpg or ??.foo) (default: empty)
     -basePath [path]        Base path for matching (default: current directory)
     -r, -recurse            Recurse (directories only, default: off)

path storage options:
     -relativePath           Relative path (default)
     -fullPath               Full path
     -fullPathNodrive        Full path - no drive letter

checksum options:
     -md5                    MD5 (default)
     -sha1                   SHA-1
     -sha256                 SHA-2 256 bits
     -sha512                 SHA-2 512 bits

-verify options:
     -ignoreMissing          Ignore missing files (default: off)
     -showNew                Show new files (default: off)
     -ignoreChecksum         Don't calculate checksum (default: off)

-update options:
     -removeMissing          Remove missing files (default: off)
     -ignoreNew              Don't add new files (default: off)
     -pretend                Show what would happen - don't write out XML (default: off)

ChecksumVerifier is free, open-source and available on github.

Minifig Collector v11.0

October 8th, 2013

My original Minifig Collector app (which was the first Android app I ever created), which has seen over 150,000 installs, just got a major facelift and some new features version 11.0.  It now has a more modern-looking UI, can import/export your figures to Brickset, and let’s you finger-swipe back and forth. Check it out!

Screenshots:

main list
browse brickset

Unofficial LEGO® Minifigure Catalog v2.0

September 13th, 2013

Over the past few weeks I’ve been working on a new version 2.0 of the Unofficial LEGO® Minifigure Catalog app. We’ve just released the version 2.0 to the Apple iTunes and Google Play App stores.

Version 2.0 introduces tablet support along with a complete visual facelift. In addition, there are several performance improvements that make the app much faster when browsing, and I’ve added the ability to browse minifigures, sets and heads by name (in addition to by year and theme).

all-3

Check it out!

Note: LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this app.

SaltThePass mobile app now available on iTunes, Google Play and Amazon

July 31st, 2013

A few months ago I released SaltThePass.com, which is a password generator that will help you generate unique, secure passwords for all of the websites you visit based on a single Master Password that you remember.

I’ve been working on a mobile / offline iOS and Android app that gives you all of the features of the saltthepass.com website.  The apps are now in the Apple iTunes App Store, Google Play App Store and the Amazon Appstore.

Let me know if you use them!
iPad, iPhone and Android apps available

How to deal with a WordPress wp-comments-post.php SPAM attack

May 9th, 2013

This morning I woke up to several website monitoring alarms going off.  My websites were becoming intermittently unavailable due to extremely high server load (>190).  It appears nicj.net had been under a WordPress comment-SPAM attack from thousands of IP addresses overnight.  After a few hours of investigation, configuration changes and cleanup, I think I’ve resolved the issue.  I’m still under attack, but the changes I’ve made have removed all of the comment SPAM and have reduced the server load back to normal.

Below is a chronicle of how I investigated the problem, how I cleaned up the SPAM, and how I’m preventing it from happening again.

Investigation

The first thing I do when website monitoring alarms are going off (I use Pingdom and Cacti) is to log into the server and check its load.  Load is an indicator of how busy your server is.  Anything greater than the number of CPUs on your server is cause for alarm.  My load is usually around 2.0 — when I logged in, it was 196:

[nicjansma@server3 ~]$ uptime
06:09:48 up 104 days, 11:25,  1 user,  load average: 196.32, 167.75, 156.40

Next, I checked top and found that mysqld was likely the cause of the high load because it was using 200-1000% of the CPU:

top - 06:16:45 up 104 days, 11:32, 2 users, load average: 97.69, 162.31, 161.74
Tasks: 597 total, 1 running, 596 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.8%us, 19.1%sy, 0.0%ni, 10.7%id, 66.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 12186928k total, 12069408k used, 117520k free, 5868k buffers
Swap: 4194296k total, 2691868k used, 1502428k free, 3894808k cached

PID   USER  PR NI VIRT RES  SHR  S %CPU  %MEM TIME+ COMMAND
24846 mysql 20 0 26.6g 6.0g 2.6g S 260.6 51.8 18285:17 mysqld

Using SHOW PROCESSLIST in MySQL (via phpMyAdmin), I saw about 100 processes working on the wp_comments table in the nicj.net WordPress database.

I was already starting to guess that I was under some sort of WordPress comment SPAM attack, so I checked out my Apache access_log and found nearly 800,000 POSTS to wp-comments-post.php since yesterday.  They all look a bit like this:

[nicjansma@server3 ~]$ grep POST access_log
36.248.44.7 - - [09/May/2013:06:07:29 -0700] "POST /wp-comments-post.php HTTP/1.1" 302 20 "http://nicj.net/2009/04/01/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;)"

What’s worse, the SPAMs were coming from over 3,000 unique IP addresses.  Essentially, it was a distributed denial of service (DDoS) attack:

[nicjansma@server3 ~]$ grep POST access_log | awk '{print $1}' | sort | uniq -c | wc -l
3105

NicJ.net was getting hundreds of thousands of POSTS to wp-comments-post.php, which was causing Apache and MySQL to do a whole lot of work checking them against Akismet for SPAM and saving in the WordPress database.  I logged into the WordPress Admin interface, which verified the problem as well:

There are 809,345 comments in your spam queue right now.

Yikes!

Stopping the Attack

First things first, if you’re under an attack like this, the quickest thing you can do to stop the attack is by disabling comments on your WordPress site.  There are a few ways of doing this.

One way is to go into Settings > Discussion > and un-check Allow people to post comments on new articles.

The second way is to rename wp-comments-post.php, which is what spammers use directly to add comments to your blog.  I renamed my file wp-comments-post.php.bak temporarily, so I could change it back later.  In addition, I created a 0-byte placeholder file called wp-comments-post.php so the POSTS will look to the spammers like they succeeded, but the 0-byte file takes up less server resources than a 404 page:

[nicjansma@server3 ~]$ mv wp-comments-post.php wp-comments-post.php.bak && touch wp-comments-post.php

Either of these methods should stop the SPAM attack immediately.  5 minutes after I did this, my server load was back down to ~2.0.

Now that the spammers are essentially POSTing data to your blank wp-comments-post.php file, new comments shouldn’t be appearing in your blog.  While this will reduce the overhead of the SPAM attack, they are still consuming your bandwidth and web server connections with their POSTs.  To stop the spammers from even sending a single packet to your webserver, you can create a small script that automatically drops packets from IPs that are posting several times to wp-comments-post.php.  This is easily done via a simple script like my Autoban Website Spammers via the Apache Access log post.  Change THRESHOLD to something small like 10, and SEARCHTERM to wp-comments-post.php and you will be automatically dropping packets from IPs that try to post more than 10 comments a day.

Cleaning up the Mess

At this point, I still had 800,000+ SPAMs in my WordPress moderation queue.  I feel bad for Akismet, they actually classified them all!

I tried removing the SPAM comments by going to Comments > Spam > Empty Spam, but I think it was too much for Apache to handle and it crashed.  Time to remove them from MySQL instead!

Via phpMyAdmin, I found that not only were there 800,000+ SPAMs in the database, the wp_comments table was over 3.6 GB and the wp_commentmeta was at 8.1 GB!

Here’s how to clean out the wp_comments table from any comments marked as SPAM:

DELETE FROM wp_comments WHERE comment_approved = 'spam';

OPTIMIZE TABLE wp_comments

In addition to the wp_comments table, the wp_commentmeta table has metadata about all of the comments. You can safely remove any comment metadata for comments that are no longer there:

DELETE FROM wp_commentmeta WHERE comment_id NOT IN (SELECT comment_id FROM wp_comments)

OPTIMIZE TABLE wp_commentmeta

For me, this removed 800,000+ rows of wp_comments (bringing it down from 3.6 GB to just 207 KB) and 2,395,512 rows of wp_commentmeta (bringing it down from 8.1 GB to just 136 KB).

Preventing Future Attacks

There are a few preventative measures you can take to stop SPAM attacks like these.

NOTE: Remember to rename your wp-comments-post.php.bak (or turn Comments back on) after you’re happy with the prevention techniques you’re using.

  1. Disable Comments on your blog entirely (Settings > Discussion > Allow people to post comments on new articles.) (probably not desirable for most people)
  2. Turn off Comments for older posts (spammers seem to target older posts that rank higher in search results). Here’s a way to disable comments automatically after 30 days.
  3. Rename wp-comments-post.php to something else, such as my-comments-post.php. Comment spammers often just assume your code is at the wp-comments-post.php URL and won’t check your site’s HTML to verify this is the case. If you rename wp-comments-post.php and change all occurrences of that URL in your theme, your site should continue to work while the spammers hit a bogus URL. You can follow this renaming guide for more details.
  4. Enable a Captcha for your comments so automated bots are less likely to be able to SPAM your blog. I’ve had great success with Are You A Human.
  5. The Autoban Website Spammers via the Apache Access log post describes my method for automatically dropping packets from bad citizen IP addresses.

After all of these changes, my server load is back to normal and I’m not getting any new SPAM comments.  The DDoS is still hitting my server, but their IP addresses are slowly getting packets dropped via my script every 10 minutes.

Hopefully these steps can help others out there.  Good luck! Fighting spammers is a never-ending battle!

2012 Minifigures Available

April 17th, 2013

Thanks to Christoph‘s hard work taking photos of all 529 minifigures released in 2012, the 2012 minifigs are now available for purchase in the Unofficial Minifigure Catalog app.

To purchase the update, first update the database to the latest version (Settings > Database) and then go to Settings > Collections and look for the purchase button.

UserTiming.js

April 15th, 2013

UserTiming is one of the W3C specs that I helped design while working at Microsoft through the W3C WebPerf working group.  It helps developers measure the performance of their web applications by giving them access to high precision timestamps. It also provides a standardized API that analytics scripts and developer tools can use to display performance metrics.

UserTiming is natively supported in IE 10 and prefixed in Chrome 25+.  I wanted to use the interface for a few of my projects so I created a small polyfill to help patch other browsers that don’t support it natively. Luckily, a JavaScript version of UserTiming can be implemented and be 100% API functional — you just lose some precision and performance vs. native browser support.

So here it is: UserTiming.js

README:

UserTiming.js is a polyfill that adds UserTiming support to browsers that do not natively support it.

UserTiming is accessed via the PerformanceTimeline, and requires window.performance.now() support, so UserTiming.js adds a limited version of these interfaces if the browser does not support them (which is likely the case if the browser does not natively support UserTiming).

As of 2013-04-15, UserTiming is natively supported by the following browsers:

  • IE 10+
  • Chrome 25+ (prefixed)

UserTiming.js has been verified to add UserTiming support to the following browsers:

  • IE 6-9
  • Firefox 3.6+ (previous versions not tested)
  • Safari 4.0.5+ (previous versions not tested)
  • Opera 10.50+ (previous versions not tested)

UserTiming.js will detect native implementations of UserTiming, window.performance.now() and the PerformanceTimeline and will not make any changes if those interfaces already exist.  When a prefixed version is found, it is copied over to the unprefixed name.

UserTiming.js can be found on GitHub and as the npm usertiming module.

breakup.js

April 14th, 2013

It’s not you, it’s me.

A few months ago I released a small JavaScript micro-framework: breakup.js

Serially enumerating over a collection (such as using async.forEachSeries()in Node.js or jQuery.each() in the browser) can lead to performance and responsiveness issues if processing or looping through the collection takes too long. In some browsers, enumerating over a large number of elements (or doing a lot of work on each element) may cause the browser to become unresponsive, and possibly prompt the user to stop running the script.

breakup.js helps solve this problem by breaking up the enumeration into time-based chunks, and yielding to the environment if a threshold of time has passed before continuing.  This will help avoid a Long Running Script dialog in browsers as they are given a chance to update their UI.  It is meant to be a simple, drop-in replacement for async.forEachSeries().  It also provides breakup.each() as a replacement for jQuery.each() (though the developer may have to modify code-flow to deal with the asynchronous nature of breakup.js).

breakup.js does this by keeping track of how much time the enumeration has taken after processing each item.  If the enumeration time has passed a threshold (the default is 50ms, but this can be customized), the enumeration will yield before resuming.  Yielding can be done immediately in environments that support it (such as process.nextTick() in Node.js and setImmediate() in modern browsers), and will fallback to a setTimeout(..., 4) in older browsers.  This yield will allow the environment to do any UI and other processing work it wants to do.  In browsers, this will help reduce the chance of a Long Running Script dialog.

breakup.js is primarily meant to be used in a browser environment, as Node.js code is already asynchronously driven. You won’t see a Long Running Script dialog in Node.js. However, you’re welcome to use the breakup Node.js module if you want have more control over how much  time your enumerations take.  For example, if you have thousands of items to enumerate and you want to process them lazily, you could set the threshold to 100ms with a 10000ms wait time and specify the forceYield parameter, so other work is prioritized.

Check it out on Github or via npm.