Posted on

Controlling attachment types in index

Relevanssi lets you index attachments – not their contents, but the names and descriptions. But perhaps you only want to index a particular type of attachment? Relevanssi settings don’t have any control over that, it’s either all attachments or nothing.

It is possible to choose which kinds of attachments are indexed. It is done with the relevanssi_do_not_index filter hook, which lets you choose whether a particular post is indexed or not. You can use the attachment MIME type to see which kind of attachment it is and use that information to weed out unwanted attachments.

Version 4.0.9 and Premium 2.1.5 introduced a new filter hook, relevanssi_indexing_restriction, which can also be used for exclusions in indexing. This filter function takes a MySQL WHERE clause. It’s slightly more complicated to formulate, but has the benefit of running very early in the indexing process. If you’re excluding lots of posts, this is a much better way to do the exclusion: it will make indexing operate much faster and has the progress meter make much more sense.

If you’re excluding only couple of posts, then relevanssi_do_not_index is likely the better option.

Note that in 4.0.9 and 2.1.5, relevanssi_indexing_restriction only applies when indexing all posts and not when saving an individual post. In future versions, starting from 4.0.10 and 2.1.6, it is also applied when a post is saved.

No images

To remove all image attachments from the index, add this code to your theme functions.php file and rebuild the index. It will weed out all attachments that have a MIME type that begins with “image”.

add_filter( 'relevanssi_indexing_restriction', 'rlv_no_image_attachments' );
function rlv_no_image_attachments( $restriction ) {
    global $wpdb;
    $restriction .= " AND post.ID NOT IN (SELECT ID FROM $wpdb->posts WHERE post_type = 'attachment' AND post_mime_type LIKE 'image%' ) ";
    return $restriction;
}
add_filter( 'relevanssi_do_not_index', 'rlv_no_image_attachments', 10, 2 );
function rlv_no_image_attachments( $block, $post_id ) {
    $mime = get_post_mime_type( $post_id );
    if ( 'image' === substr( $mime, 0, 5 ) ) {
        $block = true;
    }
    return $block;
}

Only PDFs

This function will only index PDF attachments, and nothing else.

add_filter( 'relevanssi_indexing_restriction', 'rlv_only_pdfs' );
function rlv_only_pdfs( $restriction ) {
    global $wpdb;
    $restriction .= " AND post.ID NOT IN (SELECT ID FROM $wpdb->posts WHERE post_mime_type != 'application/pdf' ) ";
    return $restriction;
}
add_filter( 'relevanssi_do_not_index', 'rlv_only_pdfs', 10, 2 );
function rlv_only_pdfs( $block, $post_id ) {
    $mime = get_post_mime_type( $post_id );
    if ( ! empty( $mime ) ) {
        $block = true;
        if ( 'pdf' === substr( $mime, -3, 3 ) ) {
            $block = false;
        }
    }
    return $block;
}

12 comments Controlling attachment types in index

  1. I tried this but it doesn’t seem to be working. Is there another way to remove them? They’re coming in first place in my search which is the last place I want them.

      1. Hi Mikko, just posts pages and downloads (from wpdownloadmanager). Also, is there a way to sort the results types, at the moment media is always on top and I want posts to be first, then downloads, then pages. (and no images).

  2. Actually… I think I might be talking about a different thing… I’m seeing images and such in the search box dropdown before I go to the whole page results… it’s in the search preview that I want to remove images.

    1. That isn’t probably coming from Relevanssi at all. As far as I can tell, the only Relevanssi-compatible search dropdown is SearchWP Live Ajax Search. If you’re using something else, it’s using the default WP search to get the results.

      1. oh… maybe that is something that came with my template. sorry to have bothered you. I’ll have to keep digging to see what’s going on.

    1. Eric, did you rebuild the index after adding the codes? If you didn’t, do that and that should solve the problem. These are indexing filters, and only take action when you are indexing posts.

      If that doesn’t help, then I would recommend debugging this, take a look at the values get_post_mime_type() is returning.

  3. I’ve tried both types of the above code, then “reset all attachments…” and then “read all unread attachments” and then Relevanssi hangs, telling me “time elapsed 11:44:20 | time remaining about 20 minutes” (numbers changing; longest I let it run was nearly 20 hours). At the bottom of the log it displays “Failed to index attachment id 7760: cURL error 28: Operation timed out after 45007 milliseconds with 0 bytes received\n” which is obviously where it is hanging. I have no idea where to find out what attachment id 7760 is (or any attachment id, for that matter). So I suspect this code no longer works with your current version. True? Also, how to find attachment id? [Premium Relevanssi]

    1. Judy, if the error says “timed out”, then the problem is not in your code or in your attachments, it’s the server: it doesn’t respond in time. The indexing should respond quickly: if nothing happens in few minutes, something’s wrong and there’s no reason to wait. The US server is slightly unstable at the moment, I’m investigating it. Meanwhile you can switch to the more reliable EU server, or simply try again later when the server has rebooted and probably responds better.

      To see which attachment the ID 7760 refers to, you can go to /wp-admin/post.php?post=7760&action=edit on your site.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.