Command-line utility for automating the fight against Google Analytics referral spam
Google Analytics referrer spam is huge and never ending pain. There are hundreds of known referrer spam domains and every other day a new one pops up. And the only way to keep the spammers from skewing your web analytics reports is to their stupid domain names – one by one.
ga-spam-control is a small command-line utility that keeps your Google Analytics spam filters up-to-date, automatically.
ga-spam-control creates filters for your Google Analytics accounts that block known referrer spam domains from your analytics reports and keeps these filter up-to-date.
To always protect your analytics reports from annoying false entries ga-spam-control combines multiple community-maintained lists of known spam domains:
- ddofborgs' Analytics Ghost Spam List
- Stevie Rays' apache-nginx-referral-spam-blacklist
- Piwik Referrer spam blacklist
This gives you the ability to completely automate your spam protection process. Just let ga-spam-control check your Google Analytics accounts daily for new spam. And when it detects new spam; update your filters.
The command line utility provides the following actions.
Spam Control Filter Actions
In order to protect your Google Analytics account from spam ga-spam-control creates filters which blocks known referrer spam domains from your analytics reports. These are the commands that help you to review and update your spam filters:
- filters status displays the spam-control status of all your accounts or for a specific account
- filters update creates or updates the spam-control filters for a specific account
- filters remove removes all previously created spam-control filters from an account
Referrer Spam Domains Actions
The basis for the spam filters is an up-to-date list of known referrer spam domains. And with these commands you can review and update the spam-domain lists:
- domains list prints a list of all currently known referrer spam domains
- domains update downloads the latest referrer spam domain name lists and updates your local list of known referrer spam domains
- domains find allows you to manually review the last
n
days of analytics data and mark domain names as spam
Which domains are currently considered spam is stored in the ~/.ga-spam-control/spam-domains/community.txt
and ~/.ga-spam-control/spam-domains/personal.txt
.
ga-spam-control <command> [<args> ...]
Print information about the available actions:
ga-spam-control help
Print detailed help information about the different arguments and flags of a specific action:
ga-spam-control help <actionname>
The first time you perform an action, you will be displayed an oAuth authorization dialog.
If you permit the requested rights the authentication token will be stored in your home directory (~/.ga-spam-control/credentials.json
).
To sign out you can either delete the file or de-authorize the "Google Analytics Spam Control" app in your Google App Permissions at https://security.google.com/settings/security/permissions.
Display the current spam-control status for all accounts that you have access to:
ga-spam-control filters status
Display the spam-control status in a parseable format:
ga-spam-control filters status --quiet
Display the current spam-control status for a specific Google Analytics account:
ga-spam-control filters status <accountID>
Create or update the spam-control filters of a given Google Analytics account:
ga-spam-control filters update <accountID>
Remove the spam-control filters of a given Google Analytics account:
ga-spam-control filters remove <accountID>
This will simply remove all filters that ga-spam-control created earlier.
Print a list of your known referrer spam domains names (community & personal):
ga-spam-control domains list
Update your local community list of known referrer spam domain names:
ga-spam-control domains update
Find referrer spam domain names in your Google Analtics data. Review the hostnames of the last n
days of one of your Google Analytics accounts and mark those which you consider spam. All marked domain names will be added to your personal referrer spam list:
ga-spam-control domains find <accountID> <numberOfDaysToLookBack>
By default ga-spam-control will use the last 90 days of analytics data. But if you want to review less or more days you can specify the number of days yourself.
The command-line package is github.com/andreaskoch/ga-spam-control/cli. You can clone the repository or install it with go get github.com/andreaskoch/ga-spam-control
and then run the make.go script:
go run make.go -test
go run make.go -install
go run make.go -crosscompile
Or with make:
make test
make install
make crosscompile
ga-spam-control is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Ideally Google would just include a spam-protection into Google Analytics but until then here are some ideas for additional features and possible improvements:
- Make remote spam domain providers configurable
- Publish spam domain names that you found in your Google Analytics accounts back to the community lists.
- Populate my own list of known referrer spam domains with the results from the
find-spam-domains
action.- Automatic daily upload from the ga-spam-control clients
- Review of the additions by trusted community members or by a tool which checks the listed website
- Create and update a "No Referrer Spam" segment and update it during the normal update process. Unfortunately I will need Google to add create and update support to the Google Analytics API for this to work (see: analytics-issues - Issue 174: Create Advanced Segment and Customized Report Through API).
- Until Google supports segment creation via the API I ga-spam-control can at least print the necessary segment content to support manual editing of spam segments.
- Use machine learning to automatically identify new referrer spam. Earlier versions of ga-spam-control already used a machine learning model. But unfortunately I could only train the model to detect new referrer spam for a single website - the model did not work well enough when I applied it to websites with different usage patterns.
- Other options for detecting referrer spam automatically
- Correlate analytics data with web server logs to identify referrer spam
- Do a word analysis of the referrer site and use regular e-mail techniques to identify spam sites
Let me know if you have other ideas, or if want one of the features implemented next.
There are multiple curated lists of referrer spam domains out there that you can use to create filters for your analytics accounts.
- Analytics Ghost Spam List
- Stevie Ray: apache-nginx-referral-spam-blacklist
- Piwik Referrer spam blacklist
- Referrer Spam Blocker Blacklist
- My own list of referral spam domains
ga-spam-control is not the first and not the only tool that helps you to block referrer spam from your Google Analytics accounts.
Filters prevent referrer spam from getting into your Google Analytics accounts. But filters don't help you with referrer spam that already reached your reports. In order to filter this spam out you can use segments that filter out the spammy traffic:
Google Analytics has a setting to block bots and spiders from your Google Analytics reports.
- Goto
Google Analytics > Admin > Account > Property > View > View Settings
- Goto
Bot Filtering
- Check
Exclude all hits from known bots and spiders
This feature is not advertised much by Google. The only time it officially got mentioned by is in a Google Plus post: Google Analytics - Introducing Bot and Spider Filtering.
I am not yet sure if this flag does the trick. One would assume that is would be easy for Google to exclude all referrer spam and block the stupid spammers once and for all.