Posted on

Search is ignoring accents

In general, searches ignore accents, which is generally a good idea: for example in French, the difference between e and é isn’t huge, and it’s fine if the search engine isn’t too picky about which is which.

However, there are cases where there is a big difference: for example in Finnish, the letter ä isn’t an a with an accent, it’s a completely different word that is at the other end of the alphabetical order and has a different pronounciation. On most WordPress sites you can search for rätti and find ratti, which is wrong.

Relevanssi isn’t actually responsible for these mixups (and also for the useful behaviour of ignoring accents). The reason is the MySQL database collation, which controls how the data is sorted and compared. The default collation for Relevanssi tables is utf8mb4_unicode_ci, which is a good, general-purpose collation. However, in some specific cases it may be too generic.

Finnish (and Swedish) users may want to use utf8mb4_swedish_ci, which for example makes a and ä separate letters. You can adjust the collation in the database table settings, which you can modify with tools like phpMyAdmin or Adminer. There’s a general collation for the table, and specific collations for individual columns, and we’re mostly interested in the specific collations for the term and term_reverse columns.

wp_relevanssi database table structure view in Adminer

Leave a Reply

Are you a Relevanssi Premium customer looking for support? Please use the Premium support form.

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.