Posted on

Words with ampersands can’t be found

A lot of people on my site are looking for ‘H&M’. I have an H&M page but it does not show up when searching with Relevanssi. How can I change the plugin so that words with the &-sign get found?

By default Relevanssi cleans out ampersands (and other punctuation). In order to keep them, you’ll have to modify the way the punctuation is handled. A simple solution to fix the problem is this:

add_filter('relevanssi_remove_punctuation', 'saveampersands_1', 9);
function saveampersands_1($a) {
    $a = str_replace('&', 'AMPERSAND', $a);
    $a = str_replace('&', 'AMPERSAND', $a);
    return $a;
}
 
add_filter('relevanssi_remove_punctuation', 'saveampersands_2', 11);
function saveampersands_2($a) {
    $a = str_replace('AMPERSAND', '&', $a);
    return $a;
}

Stick this code to your functions.php file and rebuild the index. If you’re not protecting ampersands, just change the &s to something else. For more complicated modifications, it’s best to rewrite the whole relevanssi_remove_punct() function (unhook the default function, copy it, make modifications as you see necessary and then hook in the new function).

Update 12.2.2014: Adding

$a = str_replace('&', 'AMPERSAND', $a);

to the first function covers also ampersands that are proper HTML entities.

In some cases there’s no need to keep the punctuation, but it makes sense to remove them completely instead of replacing them with spaces. This simplifies the code a bit. For example, to make hyphens inside words not a problem, add this code:

add_filter('relevanssi_remove_punctuation', 'remove_hyphens', 9);
function remove_hyphens($a) {
    $a = str_replace('-', '', $a);
    return $a;
}

Add this code to the functions.php and rebuild the index.

This was originally asked at the WP support forum.

27 comments Words with ampersands can’t be found

  1. Hi. Is this hook valid for a Multisite Installation?

    I did try (in our functions.php):

    add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_1’, 9);
    function saveampersands_1($a) {
    $a = str_replace(‘&’, ‘AMPERSAND’, $a);
    return $a;
    }
    add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_2’, 11);
    function saveampersands_2($a) {
    $a = str_replace(‘AMPERSAND’, ‘&’, $a);
    return $a;
    }

    OR

    remove_filter(‘relevanssi_remove_punctuation’, ‘relevanssi_remove_punct’);
    add_filter(‘relevanssi_remove_punctuation’, ‘enc_relevanssi_remove_punct’);

    function enc_relevanssi_remove_punct($a) {
    $a = strip_tags($a);
    $a = stripslashes($a);
    $a = str_replace(“·”, ”, $a);
    $a = str_replace(“…”, ”, $a);
    $a = str_replace(“€”, ”, $a);
    $a = str_replace(“­”, ”, $a);
    $a = str_replace(chr(194) . chr(160), ‘ ‘, $a);
    $a = str_replace(” “, ‘ ‘, $a);
    $a = str_replace(‘’’, ‘ ‘, $a);
    $a = str_replace(“‘”, ‘ ‘, $a);
    $a = str_replace(“’”, ‘ ‘, $a);
    $a = str_replace(“‘”, ‘ ‘, $a);
    $a = str_replace(“””, ‘ ‘, $a);
    $a = str_replace(““”, ‘ ‘, $a);
    $a = str_replace(“„”, ‘ ‘, $a);
    $a = str_replace(“´”, ‘ ‘, $a);
    $a = str_replace(“—”, ‘ ‘, $a);
    $a = str_replace(“–”, ‘ ‘, $a);
    $a = str_replace(“×”, ‘ ‘, $a);
    $a = str_replace(‘&’, ‘AMPERSAND’, $a);
    $a = preg_replace(‘/[[:punct:]]+/u’, ‘ ‘, $a);
    $a = str_replace(‘AMPERSAND’, ‘&’, $a);
    $a = preg_replace(‘/[[:space:]]+/’, ‘ ‘, $a);
    $a = trim($a);
    return $a;
    }

    We did rebuild the index of all blogs… but a search like this: “D&G” or “Build&Beader” didn’t works.

    We are Premium Users, would you like to help us?

    Thank you.

    1. Multisite uses the same code, so yes, this is valid for Multisite installations as well. However, I’m not sure where the code should be added… You should check if the code is being executed in the first place: add an echo and an exit to the function and see if it’s even run.

      If it’s being executed, then it should work, so I’m guessing it’s just not being noticed. I’m not sure if the code should be on network level or in individual blog level, so try the different options.

      1. Thank you! Do you mean to try the first solution? The one guggested in your post? I will try.
        I am quite sure code was executed, cause I wrote “die($a);” and I was able to see myAMPERSANDquery and my&query (I did try with “D&G”).
        I will let you know. Thank you.

        1. I recommend the first solution, but both should work.

          Do note that this should work in two places: both in indexing and in searching. if the code executes when searching, but not when indexing, or vice versa, searching won’t work.

    1. add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_1’, 9);
      function saveampersands_1($a) {
      $a = str_replace(‘-‘, ‘HYPHEN’, $a);
      return $a;
      }

      add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_2’, 11);
      function saveampersands_2($a) {
      $a = str_replace(‘HYPHEN’, ‘-‘, $a);
      return $a;
      }

      1. I feel like ‘HYPHEN’ is still a potential risk. Couldn’t we use something like ‘%HYPHEN%’ to ensure it’s unique?

  2. Don’t know what I am doing wrong. Added this code to my functions.php

    add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_1’, 9);
    function saveampersands_1($a) {
    $a = str_replace(‘_’, ‘UNDERSCORE’, $a);
    $a = str_replace(‘_’, ‘UNDERSCORE’, $a);
    return $a;
    }

    add_filter(‘relevanssi_remove_punctuation’, ‘saveampersands_2’, 11);
    function saveampersands_2($a) {
    $a = str_replace(‘UNDERSCORE’, ‘_’, $a);
    return $a;
    }

    And when I search for example GE_0001 I get 0 results, but I should get about 40
    Tried all kind of variations. Like the code with the hyphen and then only replace the – for _ all with the same result – nothing. I need to keep the _ in the results since the site is a shop with SKU with _.

    1. Karin, the code is correct, so I’d next check that Relevanssi is indexing the SKUs in the first place. If you create a product with a SKU without an underscore, for example “TESTSKU”, can you find it? If not, start by fixing that.

      1. Relevanssi was indexing the SKU. I had it working till the latest update of Relevanssi. I didn’t have the code in the functions.php but just removed the lines in de common.php of the plugin. That worked, but that is not working anymore either. Is there a way to complety empty the index and start indexing all over again. Maybe that will help

          1. I did build the index over and over again, but still when I search on ge_0001 I get all the results for GE and 0001

          2. Hmm, hard to say. The code is correct. You might want to try and see that it runs, just to be sure. Add this in the first function before the “return $a;” line and then try to save a post:

            echo “it runs”;
            exit();

            Now you should see a white page with the text “it runs”. If you don’t see a white page and the post is saved as usual, then your code does not run and that’s where the problem is.

          3. That works. I had to remove the functions because it gave “no results” where there should be results. I outcommented the line for the underscore replacement in the common.php and after that I emtied the relevanssi database table and build it again. No result.. the index again is full with lose components of the SKU

          4. it is working!!!!. Just completly uninstalled Relevanssi from my website and installed it fresh again (after I activated the code again in the functions.php. REbuild the index and when I search for LB_0001 I only get leatherbands… and nothing else 🙂 Thank you for thinking with me. Code was fine.. installation had a bug somewhere.

  3. Hi Mikko,
    I have Arabic text that have punctuation. the old versions of Relevanssi used to correctly ignore the punctuation and get the text.
    I had Relevanssi off for around a year, then installed it again to resume using it.
    The newest version did not detect the text, so when i seach using the “unpunctuated” text, i don’t get any search results.

    I tried to apply the same concept above using the following function:

    add_filter(‘relevanssi_remove_punctuation’, ‘arabic_filter’, 9);
    function arabic_filter($a) {
    $tashkeeel = array(“ّ”, “َ”, “ً”, “ُ”, “ٌ”, “ِ”, “ٍ”, “ْ”);
    $a = str_replace($tashkeeel, “”, $a);
    }

    (the punctution characters are in between “”, but might not appear properly)

    However, when i do so and rebuild the index, the index does not detect any posts and says:

    Documents in the index: 0
    Terms in the index: 0
    Highest post ID indexed: 0

    I have to remove the function to be able to regain the indexing again.

    Am I doing something wrong in my functions?

    thank you very much

    1. You’re eliminating all text, because you haven’t remembered to actually return any value. Your filter is a black hole that swallows everything… so just add a “return $a;” in the end and you’ll be fine =)

  4. I have a product that has SKU “OM100/827/LED” and when i do the search it gives me 516 results of all products that have LED indexed. It looks like it searches for “OM100 827 LED” where he removes the slashes “/” is there any workaround for this?

      1. Well i tried that first and it just changes it to &2F, but i found solution, you just select exact term in settings instead of alternatives and it works as expected. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *