During the past few months, several of my websites have been the target of some sort of SPAM attack. After my getting alerted that my servers were under high load (from Cacti), I found that a small number of IP addresses were loading and re-loading or POSTing to the same pages over and over again. In one of the attacks, they were simply reloading a page several times a second from multiple IP addresses. In another attack, they were POSTing several megabytes of data to a form (which spent time validating the input), several times a second. I’m not sure of their motives – my guess is that they’re either trying to game search rankings (the POSTings) or someone with an improperly configured robot.
Since I didn’t have anything in-place to automatically drop requests from these rogue SPAMmers, the servers were coming under increasing load and causing real visitor’s page loads to slow down.
After looking at the server’s Apache’s access_log, I was able to narrow down the IPs causing the issue. With their IP, I simply created a few iptables rules to drop all packets from their IP addresses. Within a few seconds, the load on the server returned to normal.
I didn’t want to play catch-up the next time this happened, so I created a small script to automatically parse my server’s access_logs and auto-ban any IP address that appears to be doing inappropriate things.
The script is pretty simple.  It uses tail to look at the last $LINESTOSEARCH lines of the access_log, grabs all of the IPs via awk, sorts and counts them via uniq, then looks to see if any of these IPs had loaded more than $THRESHOLD pages.  If so, it does a quick query of iptables to see if the IP is already banned.  If not, it adds a single INPUT rule to DROP packets from that IP.
Here’s the code:
#!/bin/bash
#
# Config
#
# if more than the threshold, the IP will be banned
THRESHOLD=100
# search this many recent lines of the access log
LINESTOSEARCH=50000
# term to search for
SEARCHTERM=POST
# logfile to search
LOGFILE=/var/log/httpd/access_log
# email to alert upon banning
ALERTEMAIL=foo@foo.com
#
# Get the last n lines of the access_log, and search for the term.  Sort and count by IP, outputting the IP if it's
# larger than the threshold.
#
for ip in `tail -n $LINESTOSEARCH $LOGFILE | grep "$SEARCHTERM" | awk "{print \\$1}" | sort | uniq -c | sort -rn | head -20 | awk "{if (\\$1 > $THRESHOLD) print \\$2}"`
do
    # Look in iptables to see if this IP is already banned
    if ! iptables -L INPUT -n | grep -q $ip
    then
        # Ban the IP
        iptables -A INPUT -s $ip -j DROP
        
        # Notify the alert email
        iptables -L -n | mail -s "Apache access_log banned '$SEARCHTERM': $ip" $ALERTEMAIL
    fi
done
You can put this in your crontab, so it runs every X minutes. The script will probably need root access to use iptables.
I have the script in /etc/cron.10minutes and a crontab entry to run all files in that directory every 10 minutes: /etc/crontab:
0,10,20,30,40,50 * * * * root run-parts /etc/cron.10minutes
Warning: Ensure that the $SEARCHTERM you use will not match a wide set of pages that at web crawler (for example, Google) would see. In my case, I set SEARCHTERM=POST, because I know that Google will not be posting to my website as all of the forms are excluded from crawling via robots.txt.
The full code is also available at Gist.GitHub if you want to fork or modify it. It’s a rather simplistic, brute-force approach to banning rogue IPs, but it has worked for my needs. You could easily update the script to be a bit smarter. If you do, let me know!