C++ CSS HTML Java JavaScript MySQL Oracle PERL PHP SQL Unix VBScript XHTML XML Сети
Determining the Default Character Set and Collation (MySQL 4.0)
 
Determining the Default Character Set and Collation
===================================================


defaulting leads to natural and obvious results.

Server Character Set and Collation
----------------------------------

The MySQL Server has a server character set and a server collation,
which may not be null.

MySQL determines the server character set and server collation thus:

   * According to the option settings in effect when the server starts
     up.


 the
character set, and along with it you can add
`--default-collation=collation_name' for the collation. If you don't
specify a character set, that is the same as saying
`--default-character-set=latin1'. If you specify only a character set
(for instance, `latin1') but not a collation, that is the same as
saying `--default-charset=latin1' `--collation=latin1_swedish_ci'
because `latin1_swedish_ci' is the default collation for `latin1'.
Therefore the following three commands all have the same effect:

     shell> mysqld
     shell> mysqld --default-character-set=latin1
     shell> mysqld --default-character-set=latin1
                --default-collation=latin1_swedish_ci

One way to change the settings is by recompiling. If you want to change
the default server character set and collation when building from
sources, use: `--with-character-set' and `--with-collation' as
arguments for `configure'. For example:

     shell> ./configure --with-character-set=latin1

or

     shell> ./configure --with-character-set=latin1
                --with-collation=latin1_german1_ci

Both `mysqld' and `configure' check that the character set/collation
combination is valid. Each program displays an error message and
terminates if the combination is not valid.

Database Character Set and Collation
------------------------------------


character set and collation:

     CREATE DATABASE db_name
        [CHARACTER SET character_set_name [COLLATE collation_name]]
     
     ALTER DATABASE db_name
         [CHARACTER SET character_set_name [COLLATE collation_name]]

Example:

     CREATE DATABASE db_name
        CHARACTER SET latin1 COLLATE latin1_swedish_ci;

MySQL chooses the database character set and database collation thus:

   * If both `CHARACTER SET X' and `COLLATE Y' were specified, then
     character set `X' and collation `Y'.

   * If `CHARACTER SET X' was specified without `COLLATE', then
     character set `X' and its default collation.

   * Otherwise, the server character set and server collation.


MySQL's `CREATE DATABASE ... CHARACTER SET ...' syntax is analogous to
the standard-SQL `CREATE SCHEMA ... CHARACTER SET ...' syntax. Because
of this, it is possible to create databases with different character
sets and collations, on the same MySQL server.

The database character set and collation are used as default values if
the table character set and collation are not specified in `CREATE
TABLE' statements. They have no other purpose.

Table Character Set and Collation
---------------------------------

Every table has a table character set and a table collation, which may
not be null. The `CREATE TABLE' and `ALTER TABLE' statements now have
optional clauses for specifying the table character set and collation:

     CREATE TABLE table_name ( column_list )
        [CHARACTER SET character_set_name [COLLATE collation_name]]
     
     ALTER TABLE table_name
        [CHARACTER SET character_set_name] [COLLATE collation_name]

Example:

     CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci;

MySQL chooses the table character set and collation thus:

   * If both `CHARACTER SET X' and `COLLATE Y' were specified, then
     character set `X' and collation `Y'.

   * If `CHARACTER SET X' was specified without `COLLATE', then
     character set `X' and its default collation.

   * Otherwise, the database character set and collation.


The table character set and collation are used as default values, if
the column character set and collation are not specified in individual
column definitions. The table character set and collation are MySQL
extensions; there are no such things in standard SQL.

Column Character Set and Collation
----------------------------------

Every "character" column (that is, a column of type `CHAR', `VARCHAR',
or `TEXT') has a column character set and a column collation, which may
not be null. Column definition syntax now has optional clauses for
specifying the column character set and collation:

     column_name {CHAR | VARCHAR | TEXT} (column_length)
         [CHARACTER SET character_set_name [COLLATE collation_name]]

Example:

     CREATE TABLE Table1
     (
        column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci
     );

MySQL chooses the column character set and collation thus:

   * If both `CHARACTER SET X' and `COLLATE Y' were specified, then
     character set `X' and collation `Y'.

   * If `CHARACTER SET X' was specified without `COLLATE', then
     character set `X' and its default collation.

   * Otherwise, the table character set and collation.


The `CHARACTER SET' and `COLLATE' clauses are standard SQL.

Examples of Character Set and Collation Assignment
--------------------------------------------------

The following examples show how MySQL determines default character set
and collation values.

Example 1: Table + Column Definition
....................................

     CREATE TABLE t1
     (
       c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
     ) CHARACTER SET latin2 COLLATE latin2_bin;

 in a `latin2' table.

Example 2: Table + Column Definition
....................................

     CREATE TABLE t1
     (
        c1 CHAR(10) CHARACTER SET latin1
     ) CHARACTER SET latin1 COLLATE latin1_danish_ci;

This time we have a column with a `latin1' character set and a default
collation. Now, although it might seem natural, the default collation
is not taken from the table level. Instead, because the default
collation for `latin1' is always `latin1_swedish_ci', column `c1' will
have a collation of `latin1_swedish_ci' (not `latin1_danish_ci').

Example 3: Table + Column Definition
....................................

     CREATE TABLE t1
     (
        c1 CHAR(10)
     ) CHARACTER SET latin1 COLLATE latin1_danish_ci;

We have a column with a default character set and a default collation.
In this circumstance, MySQL looks up to the table level for inspiration
in determining the column character set and collation. So the character
set for column `c1' is `latin1' and its collation is `latin1_danish_ci'.

Example 4: Database + Table + Column Definition
...............................................

     CREATE DATABASE d1 CHARACTER SET latin2 COLLATE latin2_czech_ci;
     USE d1;
     CREATE TABLE t1
     (
        c1 CHAR(10)
     );

We create a column without specifying its character set and collation.
We're also not specifying a character set and a collation at the table
level. In this circumstance, MySQL looks up to the database level for
inspiration. (The database's settings become the table's settings, and
thereafter become the column's setting.) So the character set for
column `c1' is `latin2' and its collation is `latin2_czech_ci'.

Connection Character Sets and Collations
----------------------------------------

Every connection has connection character sets and connection
collations, which may not be null. There are actually two connection
character sets, which we will call "connection/literals" and
"connection/results" when it is necessary to distinguish them.

Consider what a "connection" is: It's what you make when you connect to
the server. The client sends SQL statements, such as queries, over the
connection to the server. The server sends responses, such as result
sets, over the connection back to the client. This leads to several
questions, such as: (a) what character set is the query in when it
leaves the client? (b) what character set should the server translate a
query to after receiving it? (c) what character set should the server
translate to before shipping result sets or error messages back to the
client? You can fine-tune the setting for these things, or you can
depend on the defaults (in which case, you can skip this section).

There are two statements that affect the connection character sets:

     SET NAMES character_set_name
     SET CHARACTER SET character_set_name

`SET NAMES' indicates what is in the SQL statement that the client
sends. Thus, `SET NAMES cp1251' tells the server "future incoming
messages from this client will be in character set `cp1251'" and the
server is free to translate to its own character set, if appropriate.

`SET CHARACTER SET' indicates what is in the SQL statement that the
client sends, and also what is in the result set that the server sends
back to the client. Thus, `SET CHARACTER SET' includes `SET NAMES', and
also specifies what character set the column values will have if, for
example, you use a `SELECT' statement.

EXAMPLE: Suppose that `column1' is defined as `CHAR(5) CHARACTER SET
latin2'.  If you do not say `SET CHARACTER SET', then for `SELECT
column1 FROM t' the server will send back all the values for `column1'
using character set `latin2'. If on the other hand you say `SET
CHARACTER SET latin1' then the server will, just before sending back,
convert the `latin2' values to `latin1'. Such conversion is slow and
may be lossy.

When you execute `SET NAMES' or `SET CHARACTER SET', you are also
changing the "connection collation". However, the connection collation
exists for consistency only. Usually its value doesn't matter.

With the `mysql' client, it is not necessary to execute `SET NAMES'
every time you start up. You can add the `--default-character-set-name'
option setting to your `mysql' statement line, or in your option file.
For example, the following option file setting will change the
connection character set each time you run `mysql':

     [mysql]
     default-character-set-name=character_set_name

Character String Literal Character Set and Collation
----------------------------------------------------

Every character string literal has a character set and a collation,
which may not be null.

A character string literal may have an optional character set
introducer and `COLLATE' clause:

     [_character_set_name]'string' [COLLATE collation_name]

Examples:
     SELECT 'string';
     SELECT _latin1'string';
     SELECT _latin1'string' COLLATE latin1_danish_ci;

The simple statement `SELECT 'string'' uses the connection/literal
character set.

 we emphasize that an introducer does not cause any conversion, it
is strictly a signal that does not change the string's value. An
introducer is also legal before standard hex literal and numeric hex
literal notation (`x'literal'' and `0xnnnn'), and before `?' (parameter
substitution when using prepared statements within a programming
language interface).

Examples:

     SELECT _latin1 x'AABBCC';
     SELECT _latin1 0xAABBCC;
     SELECT _latin1 ?;

MySQL determines a literal's character set and collation thus:

   * If both `_X' and `COLLATE Y' were specified then the literal
     character set is `X' and the literal collation is `Y'

   * If `_X' is specified but `COLLATE' is not specified, then the
     literal character set is `X' and the literal collation is `X''s
     default collation

   * Otherwise, the connection/literals character set and collation.


Examples:

   * A string with `latin1' character set and `latin1_german1_ci'
     collation:

          SELECT _latin1'Mu"ller' COLLATE latin1_german1_ci;

   * A string with `latin1' character set and its default collation,
     that is, `latin1_swedish_ci':

          SELECT _latin1'Mu"ller';

   * A string with the connection/literals character set and collation:

          SELECT 'Mu"ller';


Character set introducers and the `COLLATE' clause are implemented
according to standard-SQL specifications.

`COLLATE' Clause in Various Parts of an SQL Query
-------------------------------------------------

With the `COLLATE' clause you can override whatever the default
collation is for a comparison. `COLLATE' may be used in various parts
of SQL queries.  Here are some examples:

   * With `ORDER BY':

          SELECT k
          FROM t1
          ORDER BY k COLLATE latin1_german2_ci;

   * With `AS':

          SELECT k COLLATE latin1_german2_ci AS k1
          FROM t1
          ORDER BY k1;

   * With `GROUP BY':

          SELECT k
          FROM t1
          GROUP BY k COLLATE latin1_german2_ci;

   * With aggregate functions:

          SELECT MAX(k COLLATE latin1_german2_ci)
          FROM t1;

   * With `DISTINCT':

          SELECT DISTINCT k COLLATE latin1_german2_ci
          FROM t1;

   * With `WHERE':

          SELECT *
          FROM t1
          WHERE _latin1 'Mu"ller' COLLATE latin1_german2_ci = k;

   * With `HAVING':

          SELECT k
          FROM t1
          GROUP BY k
          HAVING k = _latin1 'Mu"ller' COLLATE latin1_german2_ci;


`COLLATE' Clause Precedence
---------------------------

The `COLLATE' clause has high precedence (higher than ||), so the
expression

     x || y COLLATE z

is equivalent to:
     x || (y COLLATE z)

`BINARY' Operator
-----------------

 that
column `a' is of character set `latin1', these two queries have the
same effect:

     SELECT * FROM t1 ORDER BY BINARY a;
     SELECT * FROM t1 ORDER BY a COLLATE latin1_bin;

Note: Every character set has a binary collation.

Some Special Cases Where the Collation Determination is Tricky
--------------------------------------------------------------


collation of column `x'":

     SELECT x FROM T ORDER BY x;
     SELECT x FROM T WHERE x = x;
     SELECT DISTINCT x FROM T;

However, when multiple operands are involved, there can be ambiguity.
For example:

     SELECT x FROM T WHERE x = 'Y';

Should this query use the collation of the column `x', or of the string
literal `'Y''?

 these
rules would take care of most situations:

   * An explicit `COLLATE' clause has precedence 4.

   * A concatenation of two strings with different collations has
     precedence 3.

   * A column's collation has precedence 2.

   * A literal's collation has precedence 1.


Those rules resolve ambiguities thus:

   * Use the collation with the highest precedence.

   * If both sides have the same precedence, then it's an error if the
     collations aren't the same.


Examples:

`column1 = 'A''                      Use collation of `column1'
`column1 = 'A' COLLATE x'            Use collation of `'A''
`column1 COLLATE x = 'A' COLLATE y'  Error

Collations Must Be for the Right Character Set
----------------------------------------------


`latin2_bin' collation is not legal with the `latin1' character set:

     mysql> SELECT _latin1 'x' COLLATE latin2_bin;
     ERROR 1251: COLLATION 'latin2_bin' is not valid
     for CHARACTER SET 'latin1'

An example of the Effect of Collation
-------------------------------------

Suppose column `X' in table `T' has these `latin1' column values:

     Muffler
     Mu"ller
     MX Systems
     MySQL

And suppose that the column values are retrieved using the following
statement:

     SELECT X FROM T ORDER BY X COLLATE collation_name;

The resulting order of the values for different collations is shown in
this table:

              MX Systems               MX Systems
MySQL                    MySQL                    MySQL

 it,
which the Germans call U-umlaut, but we'll call it U-diaeresis.

The first column shows the result of the `SELECT' using the
Swedish/Finnish collating rule, which says that U-diaeresis sorts with
Y.

The second column shows the result of the `SELECT' using the German
DIN-1 rule, which says that U-diaeresis sorts with U.

The third column shows the result of the `SELECT' using the German
DIN-2 rule, which says that U-diaeresis sorts with UE.

Three different collations, three different results. That's what MySQL
is here to handle. By using the appropriate collation, you can choose
the sort order you want.

[Назад] [Содержание] [Вперед]

Главная