Posted on

Search is ignoring accents

In general, searches ignore accents, which is generally a good idea: for example in French, the difference between e and é isn’t huge, and it’s fine if the search engine isn’t too picky about which is which.

However, there are cases where there is a big difference: for example in Finnish, the letter ä isn’t an a with an accent, it’s a completely different word that is at the other end of the alphabetical order and has a different pronunciation. On most WordPress sites you can search for rätti and find ratti, which is wrong.

Relevanssi isn’t actually responsible for these mixups (and also for the useful behaviour of ignoring accents). The reason is the MySQL database collation, which controls how the data is sorted and compared. The default collation for Relevanssi tables is utf8mb4_unicode_ci, which is a good, general-purpose collation. However, in some specific cases, it may be too generic.

Finnish (and Swedish) users may want to use utf8mb4_swedish_ci, which for example makes a and ä separate letters. You can adjust the collation in the database table settings, which you can modify with tools like phpMyAdmin or Adminer. There’s a general collation for the table and specific collations for individual columns, and we’re mostly interested in the specific collations for the term and term_reverse columns.

wp_relevanssi database table structure view in Adminer

If you change the collation to something that does not ignore accents, you also need to make Relevanssi keep the accents. Add this to your site:

remove_filter( 'relevanssi_remove_punctuation', 'remove_accents', 9 );

Leave a Reply

Are you a Relevanssi Premium customer looking for support? Please use the Premium support form.

Your email address will not be published. Required fields are marked *