relevanssi_indexing_tokens

apply_filters( 'relevanssi_indexing_tokens', array $tokens, string $context )

Filters the indexing tokens before they are added to the indexing data.

Parameters

$tokens
(array) An array of token => frequency pairs.

$context
(string) The context for these tokens (possible values include content, title, comments, taxonomy-{taxonomy}, author, custom_field, excerpt, mysql-content, internal-links, pdf-content, user-meta, user-fields, user-description, term-description, term-name, posttype-decription, posttype-name).

More information

When Relevanssi indexes various parts of the post, first there’s usually some clean up (removing HTML tags and so on). Then the content is tokenized, which means it’s split into individual words. At this point the punctuation is removed, and so are stopwords and words that are shorter than the minimum length specified in the indexing settings.

Tokenizer returns an array of token => frequency pairs, so “Mary had a little lamb, a little lamb” would become array( 'mary' => 1, 'little' => 2, 'lamb' => 2 ) (“a” is too short, and “had” is a stopword). These tokens are then passed through this filter hook.

Relevanssi Premium uses this filter hook internally to add synonyms in the indexing if the synonym indexing is enabled: the tokens are examined and a token has a synonym, that too is included in the token list:

function relevanssi_add_indexing_synonyms( $tokens ) {
	global $relevanssi_variables;

	if ( ! isset( $relevanssi_variables['synonyms'] ) ) {
		relevanssi_create_synonym_replacement_array();
	}

	$new_tokens = array();
	$synonyms   = $relevanssi_variables['synonyms'];

	foreach ( $tokens as $token => $tf ) {
		if ( isset( $synonyms[ $token ] ) ) {
			$token_and_the_synonyms = explode( ' ', $synonyms[ $token ] );
			foreach ( $token_and_the_synonyms as $new_token ) {
				$new_tokens[ $new_token ] = $tf;
			}
		} else {
			$new_tokens[ $token ] = $tf;
		}
	}

	return $new_tokens;
}