Generally in our business application, we need to implement search functionality on database column like title, firstname, lastname, city etc. It will create problem in search result when data contains accent characters and the database collation is accent-sensitive. so we will get wrong result.
Let’s take an example, you need to search the database for movies having a director that contains “Shohei”. So we will write query like below:
select movieTitle from movies where director like '%Shohei%'
Ideally, the result set should include below:
Youssef Chahine,Shôhei Imamura
Shôhei Imamura,Claude Lelouch
If your database collation is accent-sensitive, the result set includes only:
Note that directors containing accent characters are excluded from the result set. This is because accent characters are excluded from the query. Unfortunately, this is not the desired result.
Now I will show you how can we search accent characters in sql query. For this, first we need to understand collection in sql.
What is collation in SQL?
A collation in SQL Server is a defined set of rules for sorting and comparing data. This includes rules for case-sensitivity and accent-sensitivity. When you select a collation (for your server, database, column, or expression), you assign specific characteristics to your data that affect the results of database operations.
When we installed SQL Server 2008 R2/SQL Server 2012, the default collation was SQL_Latin1_General_CP1_CI_AS. The collation name is broken into these parts:
- SQL = indicates a SQL Server collation name
- Latin1_General = identifies the alphabet whose sorting rules are applied
- CP1 = identifies Code Page 1252, which is a character encoding of the Latin alphabet
- CI = defines case-insensitive comparison rules
- AS = defines accent-sensitive comparison rules
I have already write one blog for the case sensitive search in sql server where we have use COLLATE Latin1_General_CS_AS = ‘Case sensitive Data’
How can we implement accent search in sql server. There are some solution for it.
1. Configure the database so that it is not accent-sensitive; change the collation to SQL_Latin1_General_CP1_CI_AI.
2. Remove and replace accented characters in the query.
select movieTitle from movies where replace(director, 'ô', 'o')like '%Shohei%'
3. Specify an accent-insensitive collation in the query
select movieTitle from movies where director like '%Shohei%' collate SQL_Latin1_General_CP1_CI_AI
Option 1 is simplest from a programmer’s perspective, because it is a permanent solution, server-wide and database-wide, and it requires no programming changes to applications. When you are installing a new SQL Server instance then this is easy to do.
Generally we are hosting application on hosting server so we cannot manage this option because they have already installed database with default option accent-sensitivity.
Options 2 and 3 are simplest from a database administrator’s perspective, because they require no changes to the database. This is useful for shared hosting environment.
Hope this will help.