Posted on

Spam search blocking

Spam searches are a common feature on WordPress sites. Spam bots make lots of useless queries. They hope sites display the search queries somewhere, providing links to spam sites. This is unpleasant, clogs up the search logs and wastes resources.

Relevanssi Premium introduced a new spam blocking feature in version 2.15.0. You can use this tool to block spam queries. The spam is best blocked on server level before it starts up WordPress. That’s often harder to do, and this method provides at least some level of protection against search spam.

The Relevanssi spam blocking works with keywords. If the keywords appear in the search query, the query process stops immediately. You can also block all queries that contain Chinese or Cyrillic characters, or emoji.

Be careful: there’s no record made of the blocked queries. It’s possible to break your search for legitimate users with this feature.

How to use the spam blocking

You can find the spam blocking feature on the Relevanssi settings page under the “Spam Block” tab. If you can’t see the tab, make sure you’re running Relevanssi Premium with version 2.15.0 or higher.

To get an idea of suitable keywords to block, check the Dashboard > User searches. If there’s nothing there, enable logging in Relevanssi settings. (If you’re reading this, you likely have logging enabled and have noticed the spam queries in the logs).

As mentioned on the settings page, top level domains are usually very good block terms. They appear often in spam queries, but very rarely in actual searches. Include the dot: .com, .cn, .shop, .online – these are rare in real searches, but commonplace in spam.

Relevanssi spam blocking is also applied to pages with the highlight parameter. This kind of spam traffic can also be a problem.

Blocking bots from search results

You can also use Relevanssi spam blocking to block bots from search results pages. This can be useful: search engine bots can hit your site search a lot, with little benefit. According to Google’s John Muller, Google doesn’t want your internal site search pages in the index. They create infinite crawling spaces, they’re often low quality and often lead to empty pages. Search engines have limited time for crawling your site. You want them crawling your actual pages instead of low-quality search results pages.

This also makes sense for performance purposes. On a site I own, there were 20,000 search queries in the server access log. Bing bot made about 16,000 of these. That’s a ton of wasted server power and a lot of unnecessary traffic.

It’s a good idea to tell bots not to access your search results pages. To instruct bots to avoid your search results pages, use this code snippet. It will add the required rule in your robots.txt file:​

add_action( 'do_robots', 'rlv_block_bots_robots_txt' );
function rlv_block_bots_robots_txt() {
	?>
User-agent: *
Disallow: /search/
Disallow: /?s=
	<?php
}

That should keep the search engine crawlers from visiting your search results. If you want to make sure, you can also the use the bot filter in Relevanssi spam block settings. The bot blocking uses the same list of bots Relevanssi also uses to block bot searches from logs. You can adjust this with the relevanssi_bots_to_block filter hook.

The bot blocking is never applied to the highlights. That would be detrimental for search engine visibility.

Blocking bad bots on server level

There are security plugins that can block bad bots from accessing your site. These plugins have other issues, and in general it’s always better to block the bots on a lower level. That will both increase security and save web server resources.

For Nginx users, there’s Nginx Ultimate Bad Bot & Referrer Blocker. For Apache users, there’s Apache Ultimate Bad Bot & Referrer Blocker. Installing these tools requires knowledge about web servers. Used well they remove load from your server. They make the Relevanssi spam blocking unnecessary.

What if you don’t have Premium?

Well, you should buy Premium (it has lots of other cool features)! If you can’t, or want to do some keyword blocking, you can also do it in code without Premium. See Keyword-based search blocking.

One comment Spam search blocking

Leave a Reply

Are you a Relevanssi Premium customer looking for support? Please use the Premium support form.

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.