C++ CSS HTML Java JavaScript MySQL Oracle PERL PHP SQL Unix VBScript XHTML XML Сети
The Character Sets and Collations that MySQL Supports (MySQL 4.0)
 
The Character Sets and Collations that MySQL Supports
=====================================================

 items
that are not on the list because defining new character sets or
collations is straightforward.

MySQL supports 70+ collations for 30+ character sets.

     mysql>
     | big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
     | dec8     | DEC West European           | dec8_swedish_ci     |      1 |
     | cp850    | DOS West European           | cp850_general_ci    |      1 |
     | hp8      | HP West European            | hp8_english_ci      |      1 |
     | koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
     | latin1   | ISO 8859-1 West European    | latin1_swedish_ci   |      1 |
     | latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
     | swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
     | ascii    | US ASCII                    | ascii_general_ci    |      1 |
     | ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
     | sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
     | cp1251   | Windows Cyrillic            | cp1251_bulgarian_ci |      1 |
     | hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
     | tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
     | euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
     | koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
     | gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
     | greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
     | cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
     | gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
     | latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
     | armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
     | utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
     | ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
     | cp866    | DOS Russian                 | cp866_general_ci    |      1 |
     | keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
     | macce    | Mac Central European        | macce_general_ci    |      1 |
     | macroman | Mac West European           | macroman_general_ci |      1 |
     | cp852    | DOS Central European        | cp852_general_ci    |      1 |
     | latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
     | cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
     | cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
     | binary   | Binary pseudo charset       | binary              |      1 |
     +----------+-----------------------------+---------------------+--------+
     33 rows in set (0.01 sec)

NB: ALL CHARACTER SETS HAVE A BINARY COLLATION. WE HAVE NOT INCLUDED
THE BINARY COLLATION IN ALL THE DESCRIPTIONS THAT FOLLOW.

The Unicode Character Sets
--------------------------

 that will be
happening soon. Now they have default case-insensitive
accent-insensitive collations, plus the binary collation.
     +---------+-----------------+-------------------+--------+
     | Charset | Description     | Default collation | Maxlen |
     +---------+-----------------+-------------------+--------+
     | utf8    | UTF-8 Unicode   | utf8_general_ci   |      3 |
     | ucs2    | UCS-2 Unicode   | ucs2_general_ci   |      2 |
     +---------+-----------------+-------------------+--------+

Platform Specific Character Sets
--------------------------------

     +----------+-----------------------------+---------------------+--------+
     | Charset  | Description                 | Default collation   | Maxlen |
     +----------+-----------------------------+---------------------+--------+
     | dec8     | DEC West European           | dec8_swedish_ci     |      1 |
     | hp8      | HP West European            | hp8_english_ci      |      1 |
     +----------+-----------------------------+---------------------+--------+

Character Sets for South Europe and Middle East
-----------------------------------------------

     
     | armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
     | cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
     | hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
     | greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
     | latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
     | geostd8  | Georgian                    | geostd8_general_ci  |      1 |
     +----------+-----------------------------+---------------------+--------+

The Asian Character Sets
------------------------

The Asian character sets that we support include Chinese, Japanese,
Korean, and Thai. These can be complicated. For example, the Chinese
sets have to allow for thousands of different characters.
     +----------+-----------------------------+---------------------+--------+
     | Charset  | Description                 | Default collation   | Maxlen |
     +----------+-----------------------------+---------------------+--------+
     | big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
     | gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
     | gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
     | euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
     | ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
     | sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
     | tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
     +----------+-----------------------------+---------------------+--------+

The Baltic Character Sets
-------------------------

The Baltic character sets cover Estonian, Latvian, and Lithuanian
languages. There are two Baltic character sets currently supported:

             +----------------------+----------+----+---------+----------+---------+
          | latin7_estonian_cs   | latin7   | 20 |         |          |       0 |
          | latin7_general_ci    | latin7   | 41 | Yes     |          |       0 |
          | latin7_general_cs    | latin7   | 42 |         |          |       0 |
          | latin7_bin           | latin7   | 79 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `cp1257' (Windows Baltic):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | cp1257_lithuanian_ci | cp1257   | 29 |         |          |       0 |
          | cp1257_bin           | cp1257   | 58 |         |          |       0 |
          | cp1257_general_ci    | cp1257   | 59 | Yes     |          |       0 |
          +----------------------+----------+----+---------+----------+---------+


The Cyrillic Character Sets
---------------------------

Here are the Cyrillic character sets and collations for use with
Belarusian, Bulgarian, Russian, Ukrainian languages.

           +----------------------+----------+----+---------+----------+---------+
          | cp1251_bulgarian_ci  | cp1251   | 14 |         |          |       0 |
          | cp1251_ukrainian_ci  | cp1251   | 23 |         |          |       0 |
          | cp1251_bin           | cp1251   | 50 |         |          |       0 |
          | cp1251_general_ci    | cp1251   | 51 | Yes     |          |       0 |
          | cp1251_general_cs    | cp1251   | 52 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

     +----------------------+----------+----+---------+----------+---------+
          | cp866_general_ci     | cp866    | 36 | Yes     |          |       0 |
          | cp866_bin            | cp866    | 68 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `koi8r' (KOI8-R Relcom Russian, primarily used in Russia on Unix):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | koi8r_general_ci     | koi8r    |  7 | Yes     |          |       0 |
          | koi8r_bin            | koi8r    | 74 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `koi8u' (KOI8-U Ukrainian, primarily used in Ukraine on Unix):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | koi8u_general_ci     | koi8u    | 22 | Yes     |          |       0 |
          | koi8u_bin            | koi8u    | 75 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+


The Central European Character Sets
-----------------------------------

We have some support for character sets used in The Czech Republic,
Slovakia, Hungary, Romania, Slovenia, Croatia, and Poland.

    |
          +----------------------+----------+----+---------+----------+---------+
          | cp1250_general_ci    | cp1250   | 26 | Yes     |          |       0 |
          | cp1250_czech_ci      | cp1250   | 34 |         | Yes      |       2 |
          | cp1250_bin           | cp1250   | 66 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `cp852' (DOS Central European):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | cp852_general_ci     | cp852    | 40 | Yes     |          |       0 |
          | cp852_bin            | cp852    | 81 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `macce' (Mac Central European):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | macce_general_ci     | macce    | 38 | Yes     |          |       0 |
          | macce_bin            | macce    | 43 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `latin2' (ISO 8859-2 Central European):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | latin2_czech_ci      | latin2   |  2 |         | Yes      |       4 |
          | latin2_general_ci    | latin2   |  9 | Yes     |          |       0 |
          | latin2_hungarian_ci  | latin2   | 21 |         |          |       0 |
          | latin2_croatian_ci   | latin2   | 27 |         |          |       0 |
          | latin2_bin           | latin2   | 77 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `keybcs2' (DOS Kamenicky Czech-Slovak):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | keybcs2_general_ci   | keybcs2  | 37 | Yes     |          |       0 |
          | keybcs2_bin          | keybcs2  | 73 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+


The West European Character Sets
--------------------------------

 Icelandic,
Irish, Scottish, and English.

   * `latin1' (ISO 8859-1 West European):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | latin1_german1_ci    | latin1   |  5 |         |          |       0 |
          | latin1_swedish_ci    | latin1   |  8 | Yes     | Yes      |       0 |
          | latin1_danish_ci     | latin1   | 15 |         |          |       0 |
          | latin1_german2_ci    | latin1   | 31 |         | Yes      |       2 |
          | latin1_bin           | latin1   | 47 |         | Yes      |       0 |
          | latin1_general_ci    | latin1   | 48 |         |          |       0 |
          | latin1_general_cs    | latin1   | 49 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

     The `latin1_swedish_ci' collation is the default that probably is
     used by the majority of MySQL customers. It is constantly stated
     that this is based on the Swedish/Finnish collation rules, but you
     will find Swedes and Finns who disagree with that statement.

     The `latin1_german1_ci' and `latin1_german2_ci' collations are
     based on the DIN-1 and DIN-2 standards, where DIN stands for
     Deutsches Institut Fu"r Normung (that is, the German answer to
     ANSI).  DIN-1 is called the dictionary collation and DIN-2 is
     called the phone-book collation.

        * `latin1_german1_ci' (dictionary) rules:

               `A"' = `A', `O"' = `O', `U"' = `U', `ss' = `s'

        * `latin1_german2_ci' (phone-book) rules:

               `A"' = `AE', `O"' = `OE', `U"' = `UE', `ss' = `ss'


   
          +----------------------+----------+----+---------+----------+---------+
          | macroman_general_ci  | macroman | 39 | Yes     |          |       0 |
          | macroman_bin         | macroman | 53 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+

   * `cp850' (DOS West European):
          +----------------------+----------+----+---------+----------+---------+
          | Collation            | Charset  | Id | Default | Compiled | Sortlen |
          +----------------------+----------+----+---------+----------+---------+
          | cp850_general_ci     | cp850    |  4 | Yes     |          |       0 |
          | cp850_bin            | cp850    | 80 |         |          |       0 |
          +----------------------+----------+----+---------+----------+---------+


[Назад] [Содержание] [Вперед]

Главная