UPDATE 9/24/15
Adam Steele from Loganix put out a new tool to address referral spam. Works like a charm: Loganix Referral Spam Blocker
Another great article about the subject: http://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/
UPDATE 4/1/15
Been seeing a huge uptick in referral spam lately (social-buttons.com, semalt.semalt.com), which is driving me bonkers! I went in search to find the best option. From everything I have read, this is an exploit of Google’s servers, so adding the code below to your .htaccess file will not work. Below is what I found to be the best solution until Google corrects the issue.
“They use a vulnerability in Google Analytics to make fake visits so the only way to stop them for now, and until Google fix it, is to make a filter in GA since that is the source of the problem.
Blocking them in the .htaccess file is pointless since this kind of Spam never visits your site.
Check this answers for more information about this spam http://stackoverflow.com/a/29312117/3197362
And this for Referrer Spam in General and some methods you can use to filter them and stop future occurrences http://stackoverflow.com/a/28354319/3197362
As for the previous/historical data, you can use segments in Google analytics. Create a REGEX with the Spam names something like this:
social-buttons.com|simple-share-buttons.com
You can add as many as you want, but the REGEX has a 255 character limit. You can add multiple conditions if this happens
- Go to the Reporting section in your Google Analytics.
- In the lateral bar, expand Acquisitions > All Traffic and Select Referrals.
- In the main board Click on +Add Segment.
- Click on New Segment.
- Select Conditions Below Advanced.
- Set filter as Exclude. Change Ad Content for Medium and contains for exactly matches and type and select referral in the text box.
- Click on AND
- Change Ad Content for Source and contains for matches regex and paste the Spam Regex.”
Full post can be found here.
The clearing of your referral spam from your website is one of the most important things that you can do to make it easier to see where your traffic is coming from. When you have robot and non-human visits in your reports, it can throw off your SEO strategy and make it very difficult to see where traffic is coming from. The first step is understanding the non-human traffic on Google Analytics. There are three types:
- Bots and spiders that behave properly like Google’s own spiders.
- Crawlers like semalt and makemoneyonline.
- “Fake” referrals that come from darodar, ilovvitaly, blackhatworth and priceg.
So, how do you implement new filters to get rid of this robot traffic? The first step is to use caution and make sure that you are ready to filter these bots. First, make sure that you have zero filters in place right now, that you have a completely unfiltered view. Second, when you do create a filter, do it in a test view that has all the same settings as your main view so that you don’t completely mess up Google Analytics. Finally, when you have determined that the filter is working properly, put it in the main view of your Analytics. Now, here is some specific information on filtering the three types of bot visitors.
The Google Spiders & Other Bots That Behave
If it wasn’t for the Google spiders and others, there would be no internet, no web to surf. Spiders get sent out from their mothership websites to find and index new content. They publish and share content on indexing sites like Google as well as other aggregators. However, publishers and companies that use spiders are required to identify these bots so that they don’t show up in your analytics. Google allows you to easily filter these spiders with one simple checkmark. Just go to the Admin section of your Analytics and go to each view and click View Settings and then check the box that says “Exclude all hits from known bots and spiders.”
Semalt and Other Unidentified Crawlers
It can be difficult to know what to do with bots like Semalt, because some of these identified crawlers are best removed from your reports by going to the website and asking to be taken off the list and others will give you a virus if you try to do that. A good way to judge whether or not the bot is a harmful one is by searching for it and looking at the first 2 or 3 pages of search results to find out if its an infection of just a site that doesn’t like to identify its crawlers as such.
Don’t actually click on the links in the search results when you search for spyware (since you might also get a virus that way) but instead just look through the results and get an idea if it is safe to visit their site or not. If so, then visit and ask to be excluded. In the case of Semalt, it is safe to visit their site and ask to be removed. You can also check the Google Analytics Group for such spiders or the Google Plus page.
As far as removing them without visiting the site and asking to be removed, this is quite simple and something that you’ll do from your Analytics page. But it isn’t the most effective way, because the visits will still show up in the total session information. The best way to do it, if you know how, is to modify the .htaccess file (in Apache web servers). You can find out how to exclude sites at your web host with a little research.
As for filtering using Analytics, you do it by finding a signature to the site. For example, with Semalt (even though you can filter Semalt by going to their site and asking for an exclusion) you could create a filter that would exclude anything from Semalt.com would work. Make sure that your filter doesn’t exclude other sites or referrers that you want to show up in your reports however. Also, keep in mind, you can put a backslash with the domain if you want to exclude all characters like Semalt\.com but you don’t need to because Google apparently fixes this for you.
Also, bear in mind, you will need to modify your filter for each new unidentified crawler and you will run out of filters eventually, so check it without any filters a few times a year as most bots will stop crawling your site after a few months.
Filtering Fake Referrals
Fake referrals are some of the latest arrivals to the game of crawling your website and messing with your Analytics reports. They are from darodar.com, economy.co, ilovevitaly.com and blackhatworth.com and there will probably be more by the time you read this. Why are they called Fake Referrals? Because they haven’t actually visited your site. Instead, they post a fake page view to Google’s tracking service using random tracking ID’s and when one of those are yours, Google shows a hit in your reports.
You might be asking at this point – how can you block something that hasn’t ever actually visited your site? Good question. Using your .htaccess file or Javascript is out, but you can still create a filter to exclude them, but you are going to have a hard time keeping up because as soon as you create a filter for one, another will pop up. They are like cockroaches – or maybe a hydra.
How to Eliminate Fake Referrals Completely
There are a few methods to get rid of these fake referrals. The first and probably most effective way to create an include filter that excludes all hostnames except your valid web hosts. When the fake referral sends their referral they are picking numbers at random and coming from an invalid host name (not one of yours) so by creating an include filter you are able to rid yourself of crawlers that come from hosts that aren’t yours. However, you need to be careful about using this method, because if you do it incorrectly, you will exclude valid traffic, and that will defeat the purpose of excluding this invalid traffic.
You have to make sure that you identify all of the valid hostnames that come with your web host, and you must identify all of he ones that uses your website tracking ID. This might include other websites that you are tracking referrals from, so if you exclude everything but your webhost you will lose referrals like WordPress, Paypal and YouTube.
The best way to do this is to start with a report for several years that shows just the hostnames and then go through them and confirm the validity of each. For example: you can use the All Traffic report, choose medium and then click on referral and add “Behavior – Hostname” as a second filtering level. This will be complex or simple depending upon your sites, or the complexity of your online franchise. This might take some time and some investigating to get it right, but it is important that you do.
Once you have determined what is valid then create a filter that lists all of the domains that you have deemed valid, and then test it to make sure you have not missed any steps. When you are certain you have the entire list, then you can start using it. Remember, you have to update this filter whenever you enter your tracking ID on a new web service, and you should always use an unfiltered view to ensure that you aren’t excluding valid traffic.
Excluding Using Last Tracking Number
There is a method that you can use that seems to work well if you want to test it. Most of the time, spammers sending “fake” referrals target tracking numbers that end in one like (UA-2345678-1) but if you make another properly in your Google Analytics account and change your tracking code to end in 2 or 3 you will notice that most of these fake referrals won’t show up. You may still get a few but most will be targeting the tracking number ending in one. However, the downside to this method is that you lose the continuity or your reports, since you can’t transfer the data from one tracking number property to another.
The Referral Exclusion List
You may find some people recommending that you use the Referral Exclusion List feature from Analytics, but keep in mind, this is not the best solution because it is notoriously unreliable. It might remove some of the crawlers that you don’t want, but it might also change their visit to a direct one and it will keep appearing in your reports. Whether or not this works on ghost referrals depends upon the parameters that the spam creator has set. So, stick with the manual solutions and don’t worry about the referral exclusion list. It isn’t even intended for this sort of filtering anyway.
Why They Send Unidentified and Fake Crawlers
There are several different reasons that someone might send out a bot like the ones that we’ve talked. The most probable reason is that they are sending out a bot to gather information just like the spiders that come from the webservers at Google do. The difference is that while Google is doing it for a specific and legitimate purpose – to index sites in the search results, these other bots are probably doing it for a less legitimate – and possibly quite nefarious – purpose. For example: they are looking for security vulnerabilities.
Another reason that site owners (also known as spammers) send out bots is that they want site owners to visit their site – possibly to install malware, but most likely to get you to visit their site because you want to see who referred to your own site. Semalt does this exact thing, because they sell SEO services. They are targeted small website owners who want to find out who linked to their site (and will hopefully buy SEO services from them).
Finally, it all comes down to spam. They are trying to get as much traffic to their site as possible through the shady method of spamming site owners with referral links and that can get them lots and lots of free page views. The beauty of this method is that it doesn’t show as a bounce for them because many site owners will spend several minutes searching that page for a link to their website.
How They Do It
As for how they do it, this is relatively simple. They look for links on a bunch of webpages, and then follow those links and follow the links that they find there and so on – ad infinitum. The new crawlers will use Javascript to get dynamic page content and then they will trigger the Analytics tracking code because of this. Or they may not visit your site at all – they just put in a range of tracking IDs randomly selected. But since they don’t know your servers host name it will appear as something else entirely which can tell you that the traffic is manufactured rather than real.
Do You Really Need All Three of These Filters?
Yes, you need all three of these filters. Each one does something different and filters a different type of bot. If you don’t use all three, you are still going to see non-human traffic in your analytics reports.
To Summarize:
Exclude Google Bots and other well-behaved crawlers by telling Analytics to ignore them. Go to Admin > View > Settings.
Exclude Unidentified (Unidentified meaning they don’t self-identify as bots like they are supposed to) Crawlers by editing your .htaccess file or remove them from your reports using filters, but they will still show up when Google determines whether to apply data sampling.
Exclude Fake Crawlers must be removed by an include filter or you can use the method where you change your tracking ID and use that data.
Edit: Block Darodar.com (.htaccess Method)
Add this code to your .htaccess file.
SetEnvIfNoCase Referer darodar.com spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
To block other referrers, simply change the referrer.
Thanks to Dale from Sudo Rank for this:
Article Here