Skip to main contentSkip to footer

By default, Relevanssi tends to prefer longer posts. The default TF × IDF weights Relevanssi uses simply count the term frequency, ie. how many times a word appears in the post. That prefers longer posts as they usually have the search term appear more often. However, a 500-word post with 15 search term appearances might well be a better match for the search than a 2000-word post with 20 search term appearances, as the density is much higher in the shorter post.

One way to make Relevanssi give a boost to shorter posts is the add a consideration for the document length in the calculations. Adding this function hooked to the relevanssi_match hook to your site will include inverse document length in the weights:

add_filter( 'relevanssi_match', 'rlv_inverse_document_length', 10, 2 );
function rlv_inverse_document_length( $match, $idf ) {
    global $relevanssi_post_idl;
    if ( isset( $relevanssi_post_idl[ $match->doc ] ) ) {
        $idl = $relevanssi_post_idl[ $match->doc ];
    } else {
        $current_post_object = relevanssi_get_post( $match->doc );
        $minimum_doc_length  = 5000; // in characters
        if ( ! $current_post_object ) {
            $idl = 1;
        } else {
            $post_length = max( $minimum_doc_length, strlen( $current_post_object->post_content ) );
            $idl         = $minimum_doc_length / $post_length;
        }
        $relevanssi_post_idl[ $match->doc ] = $idl;
    }
    $match_multiplier = $match->weight / ( $match->tf * $idf );
    $match->weight    = $match_multiplier * $match->tf * $idf * $idl;
    return $match;
}

What this does is to determine the post length (in characters, not in words, because counting words is slower) and then come up with a ratio between the current post length and the minimum post length chosen in the function. Here it’s set to 5000 characters, which means that all posts are considered at least 5000 characters long and posts longer than that will get a multiplier that goes down from 1 towards 0 as the post gets longer.

This will give a boost to shorter posts that have a higher weight and will punish very long posts that rank high just for being long.

The version above only considers post content. It does not count the attachment content. This version includes that:

add_filter( 'relevanssi_match', 'rlv_inverse_document_length', 10, 2 );
function rlv_inverse_document_length( $match, $idf ) {
    global $relevanssi_post_idl;
    if ( isset( $relevanssi_post_idl[ $match->doc ] ) ) {
        $idl = $relevanssi_post_idl[ $match->doc ];
    } else {
        $current_post_object = relevanssi_get_post( $match->doc );
        $minimum_doc_length  = 5000; // in characters
        if ( ! $current_post_object ) {
            $idl = 1;
        } else {
            $post_content = $current_post_object->post_content;
            $pdf_content  = get_post_meta( $current_post_object->ID, '_relevanssi_pdf_content', true );
            $content      = $post_content . ' ' . $pdf_content;
            $post_length  = max( $minimum_doc_length, strlen( $content ) );
            $idl          = $minimum_doc_length / $post_length;
        }
        $relevanssi_post_idl[ $match->doc ] = $idl;
    }
    $match_multiplier = $match->weight / ( $match->tf * $idf );
    $match->weight    = $match_multiplier * $match->tf * $idf * $idl;
    return $match;
}

Your account

Not logged in. Log in to see your license details.

Search

Popular Resources

WPML: Category exclusions
Relevanssi category exclusion setting doesn’t work properly with WPML. Here’s a bit of code from Srdjan Jocić from OnTheGoSystems that…
BeTheme
BeTheme does the strangest, weirdest, and least productive things with search I’ve ever seen in a professional theme. It can…
ThemeCo
ThemeCo themes use custom codes for dynamic content. Those are not usual shortcodes, and Relevanssi won’t expand them automatically. In…

Related Posts:

Comment Section:

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed