F.37. unaccent

unaccent removes accents (diacritic signs) from a lexeme. It's a filtering dictionary, that means its output is always passed to the next dictionary (if any), contrary to the standard behaviour. Currently, it supports most important accents from european languages.

Limitation: Current implementation of unaccent dictionary cannot be used as a normalizing dictionary for thesaurus dictionary.

F.37.1. Configuration

A unaccent dictionary accepts the following options:

The rules file has the following format:

Look at unaccent.rules, which is installed in $SHAREDIR/tsearch_data/, for an example.

F.37.2. Usage

Running the installation script creates a text search template unaccent and a dictionary unaccent based on it, with default parameters. You can alter the parameters, for example

=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');

or create new dictionaries based on the template.

To test the dictionary, you can try

=# select ts_lexize('unaccent','Hôtel');
 ts_lexize 
-----------
 {Hotel}
(1 row)

Filtering dictionary are useful for correct work of ts_headline function.

=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
=# ALTER TEXT SEARCH CONFIGURATION fr
        ALTER MAPPING FOR hword, hword_part, word
        WITH unaccent, french_stem;
=# select to_tsvector('fr','Hôtels de la Mer');
    to_tsvector    
-------------------
 'hotel':1 'mer':4
(1 row)

=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column? 
----------
 t
(1 row)
=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
      ts_headline       
------------------------
  <b>Hôtel</b>de la Mer
(1 row)

F.37.3. Function

unaccent function removes accents (diacritic signs) from argument string. Basically, it's a wrapper around unaccent dictionary.

   unaccent([dictionary,
   ] string) 
  returns text
 

SELECT unaccent('unaccent','Hôtel');
SELECT unaccent('Hôtel');