Posted on

Keyword-based search blocking

Update: In Premium 2.15.0 and later versions, you can do spam blocking from Relevanssi settings. Just navigate to the Spam Block tab on Relevanssi settings!

If your search logs are full of spam with repeating keywords, you’re being targeted by a spammer. Their goal is to get visibility to their malicious URLs, hoping Google will index your search results pages with their spam URL in it.

There are probably better ways to stop these kinds of spammers, but here’s one approach you can use without any access to the server level settings. Add the following code to the theme functions.php:

add_filter( 'pre_get_posts', 'rlv_block_search' );
function rlv_block_search( $query ) {
    if ( ! empty( $query->query_vars['s'] ) ) {
        $blacklist = array( '大奖', 'q82' ); // add blacklist entries here; no need for whole words, use the smallest part you can
        foreach ( $blacklist as $term ) {
            if ( mb_stripos( $query->query_vars['s'], $term ) !== false ) {
                http_response_code( 410 );
                exit();
            }
        }
     }
}

Now any searches that include the blacklisted terms stops WordPress execution immediately, returning a 410 Gone status so that if Google ends up indexing the spam search pages, it will know the pages shouldn’t be indexed.

Be careful and only list things in black list that are never good searches. Parts of spammer URLs and for example Chinese characters on a site with no Chinese content are fairly safe bets. You can find your personal problem search terms from the site search logs (Dashboard > User searches).

11 comments Keyword-based search blocking

  1. Hi there ,

    is this also the recommended approach to stop what looks to me like SQL injection attempts?

    Recently, my logs are full of search terms such as:

    puerto plata’ and sleep(3) and ‘1

    puerto plata1111111111111’ union select char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45),char(45,120,52,45,81,45),char(45,120,53,45,81,45),char(45,120,54,45,81,45),char(45,12

    Thank you.

    1. Yes, that’s an injection attack attempt. No need to worry about it, though: it’s not possible to do a SQL injection attack through Relevanssi search, all search terms are escaped and safe. Blocking can be used to keep these queries from polluting your logs, but it’s not necessary for security.

  2. Tried this and it successfully blocked those words, but then the bots hitting our site just adapted with other messages. We get about 8,000 site searches a day from these bots. Hoping to figure out how to block them so we don’t have to move away from this plugin. Any thoughts?
    Thank you,
    Troy

    1. Troy, if you’re blocking at WordPress level, it’s already too late. The blocking should be done at the server level. Something like fail2ban or Nginx Ultimate Bad Bot Blocker are likely much better tools for a job like this.

  3. When you say blocking at the server level, are you suggesting by IP? I have already blocked all countries but the US and still getting crazy amounts of spam bot searches.

    1. Dan, yes, and whatever tools there are at server level, things like Fail2Ban and so on – I am by no means an expert on that and don’t even know the full range of tools available at server level.

  4. Thanks a lot !
    recently I had an attack of this kind with almost 1000000 indexed pages! and 100k per day.. My google rank is also down accordingly :/
    I found a common factor in their query “Link:” and “879783”, “878720”. I just added your function to the site and also blocked /search/ and */search/ in the robots.txt. I wait a few days to see if my rank returns to normal.
    + I have no trace in my DB.

Leave a Reply

Are you a Relevanssi Premium customer looking for support? Please use the Premium support form.

Your email address will not be published. Required fields are marked *