How to Block Semalt Crawler and Clean Up Your Analytics


Published on

Semalt have annoyed many webmasters by skewing analytics as their crawler turns up as a referrer. To add insult to injury, the Semalt crawler does not honour robots.txt nor does using the Semalt removal tool do anything more than invite further bot visits.

This pdf outlines 3 ways you can stop Semalt from polluting your analytics:
1. Block the Semalt crawlers via htaccess
2. Block Semalt crawlers via php
3. Remove Semalt referrals via Google Analytics

Published in: Internet
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How to Block Semalt Crawler and Clean Up Your Analytics

  1. 1. Semalt Referral Spam And Small Business Websites Original post My clients are small businesses often targetting a specific local area. More often than not, they are trying to get their head around this “online stuff” and want to make it work for them. One of the many things I’ll recommend is regular visits to their analytics. Life is much easier when you know where your traffic is coming from, what is working and what is not. A local business owner wears many hats. Web marketing is a very small one and is often limited on time and budget. Checking your analytics can help prioritise which social sites to spend time on, any ads are bringing traffic and the time and/or money spent is actually bringing a return. Enter Semalt – showing up as a referrer in analytics. Not once in a blue moon either… Who are Semalt? Semalt claim to be a …professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’positions tracking and comprehensible analytics business information. (Their words, no I’m not linking to their page) For a “professional webmaster analytics tool” one has to wonder why they think it is acceptable to send their crawler as a referrer and not a standard bot visit. They allegedly understand webmasters frustration at them totally screwing up your analytics with their referrer spam and invite you to remove your website from the seed list. DON’T REQUEST TO REMOVE YOUR WEBSITE VIA SEMALT I’ll show you why you shouldn’t use Semalt’s removal request in a moment. Of course, to find where their removal request is you have to do some rummaging around the web. The first you’ll hear of it is via comments on so many blogs complaining about Semalt or may be on Twitter. They have a person that just goes around Tweeting and commenting trying to ease people’s concerns. Semalt’s homepage gives you nothing but a sign up form. No links to usual pages like contact, services, privacy or anything else. Being the sweet and innocent person I am (stop laughing), I removed from their seed list some months back in good faith. They did honour it. I no longer receive visits from the main Semalt crawler. All was quiet for a week or two and then bombarded with visits from various semalt subdomains, kambasoft and savetubevideo. The client who I was talking to earlier today did her own check on Semalt and also put in a removal request for her website a month or so ago. She logged into analytics today to this…
  2. 2. Google Analytics screenshot – this is just a few of the Semalt referrals to a small local website in 1 month! 73% of her referral traffic (38% of total traffic) is from Semalt and friends. I have no idea what Semalt’s game is, but those numbers certainly are NOT honouring a removal request. Maybe in the Ukraine (where Semalt are based) “Remove” actually means, “Come, bring your friends! We have cake!” What is a web crawler? A web crawler is an automated bot that systematically crawls the World Wide Web. Search engines use them to index your web pages so they can efficiently serve their search results. There are other uses, both good and bad for crawlers. Most of the time you wouldn’t be aware of their visits to your site. Semalt claim that their crawler is no different than Google, Bing or Yahoo crawling your website. None of the major search crawlers come in as a referrer. Their bots (and others like them) pop along in the backgound and you wouldn’t see them screwing up your analytics pretending to be real visits. The screenshot above is to a local website for a small business based in a village in south England. It’s rarely updated. Google, Bing, Yahoo and others do pop along regularly but not several times per day. There’s simply no need to. The long and short of the matter is Semalt’s crawlers do not act like legit web crawlers. What are Semalt up to? At first glance, this appears to be nothing more than a shady marketing technique to get curious webmasters to visit their site. Go to any of the referral URLs for Semalt and you land on their main web page that currently invites you to try their software free for 7 days. Look a bit closer, visit the Kambasoft referrals and you are redirected to random websites – perhaps it’s just a way of driving traffic? Useless, untargetted traffic perhaps, but I am sure someone somewhere is fooled by big numbers. Do more research and it gets a bit scary…
  3. 3. It would appear that Semalt are involved in more than shady referral tactics, going as far as actually infecting people’s computers with trojans to build their web of spambots. Read more about that side of things at nabble. How to block Semalt and friends 3 Ways To Block Semalt And Clean Up Your Analytics Semalt doesn’t appear to honour robot.txt. They also have so many IP addresses, blocking by IP is impractical. If you really want to try that route, you can find a list of IP addresses associated with Semalt here. There are some alternative steps you can take.
  4. 4. 1. Edit your .htaccess file (if you have one and have access) Since Semalt ignores robot.txt, you can block it’s crawler using your .htaccess file. This file is very powerful and can break your website – so if you’re unsure then leave it be. The semalt crawlers themselves don’t appear to be malicious at the moment, just a pain in the backside screwing stats and using resources (this can become a problem though!) Add the following code to your .htaccess file # Block fake traffic RewriteEngine on Options +FollowSymlinks # Block all http and https referrals from "" and all subdomains of "" RewriteCond %{HTTP_REFERER} ^https?://([^.] +.)* [NC,OR] # Block all http and https referrals from "" and all subdomains of "" RewriteCond %{HTTP_REFERER} ^https?://([^.]+.)* [NC,OR] # Block all http and https referrals from" and all subdomains of "" RewriteCond %{HTTP_REFERER} ^https?://([^.]+.)* [NC,OR] # Block all http and https referrals from "" and all subdomains of "" RewriteCond %{HTTP_REFERER} ^https?://([^.]+.)* [NC] RewriteRule ^(.*)$ [L] The code basically says if a crawler comes in from any of the Semalt sites then turn it around and send it back to Semalt. They can have their spam back, thank you. If you’re not as annoyed as me and don’t feel comfortable sending their bot back to them, you can replace the last line with RewriteRule .* - [F] Many thanks to Michael Martinez for the code on their post over at Marketing Pilgrim, “Tips for Blocking Semalt and Botnet Attacks“. The original bit of code I was using is found at # block visitors referred from RewriteEngine on RewriteCond %{HTTP_REFERER} [NC] RewriteRule .* - [F] This snippet only blocks Semalt, not subdomains or their friends at Kambasoft et. al. Instead of redirecting the bot back to Semalt, it simply denies access. It worked perfectly well until Semalt started adding more and more sources. Since I reached 100′s of referrers between Semalt and
  5. 5. Kambasoft, it was getting rather silly. The first code above is a cleaner way of doing it. You can add each referrer as you see one come in. # block visitors referred from RewriteEngine on RewriteCond %{HTTP_REFERER} [NC,OR] RewriteCond %{HTTP_REFERER} [NC] RewriteRule .* - [F] WordPress recommend the following code added to .htaccess. SetEnvIfNoCase Referer spammer=yes Order allow,deny Allow from all Deny from env=spammer Again, this will only block the original Semalt bot and you’ll need to add each referrer as you see it. ALWAYS take a copy of your original .htaccess file so you can change back if anything does go pear shaped. I did not write any of this code (I’m not that clever!) I make no guarentees of suitability, fit for purpose or anything else. 2. Blocking via PHP The nabble guys mentioned above who tracked down Semalt using malware added an update to their post: Update / August 8 — We’ve created a simple PHP package to block referrer spammers such as Semalt from visiting your site: Far too technical for me – but may be useful to people running PHP based websites (or more likely, their developers!) 3. Remove Semalt from showing in Google Analytics • log into Google Analytics and select your website • click on ADMIN in the top menu bar • in the central PROPERTY column, click js TRACKING INFO then REFERRAL EXCLUSION LIST • click the red ADD REFERRAL EXCLUSION button • enter the referrer URL in the box and click the blue CREATE button You will need to add each referrer individually. Using this method won’t block Semalt but will stop them showing as a referrer in Google Analytics. Conclusion Semalt are obviously not a legit company. Regardless of their tactics to get people to sign up for their service, a legit business would honour requests not to crawl.
  6. 6. The mess in analytics is not helpful particularly for small business owners who are pushed for time, resources and knowledge to make sense of them. Other than making a mess in analytics, the Semalt crawlers don’t appear to be malicious. Of course, the software they push people to download may well be, so I advise staying well clear. Original post Did you find this useful? Please share with your network About Jan Kearney I believe that every business, no matter how small or how local can use the power of the web to gain more customers. I offer no bull coaching and mentoring so small business owners can strategically put the web to work for their business. I’ve been called a “compass” and a “navigator” and probably a few more names that aren’t suitable for a profile! Connect with me on Google+, Facebook, Pinterest or follow along at the My Local Business Online blog