Posted on

Indexing embedded PDFs for the parent post

Relevanssi can automatically index PDF content for the parent post, if the PDF (or other attachment) is attached to the parent post in WordPress. However, that’s not always the case. Sometimes the PDF is attached to the page using an embed, and that doesn’t create a connection between the posts in WordPress. Thus, Relevanssi won’t know the PDF is embedded in the post and cannot index the PDF contents for the parent post.

One such case is the PDF.js Viewer Shortcode plugin. It uses a shortcode to embed a PDF viewer on the page, but creates no connection between the posts.

It’s still possible to index the PDF contents for the parent post, it just takes some hacking. This function can be added to your theme functions.php and when a post is indexed, it will find the pdfjs-viewer shortcodes from the post, will grab the PDF URLs from the shortcodes and then find the attachment posts from based on the URL.

add_filter( 'relevanssi_content_to_index', 'rlv_pdfjs_content', 10, 2 );
function rlv_pdfjs_content( $content, $post ) {
    $m = preg_match_all( '/\[pdfjs-viewer url="(.*)"/', $post->post_content, $matches );
    if ( $m ) {
        global $wpdb;
        $upload_dir = wp_upload_dir();
        foreach ( $matches[1] as $pdf ) {
            $pdf_url     = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $pdf ) ), '/' );
            $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) );
            $content    .= $pdf_content;
        }
    }
    return $content;
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.