The Character Sets and Collations that MySQL Supports
=====================================================
items
that are not on the list because defining new character sets or
collations is straightforward.
MySQL supports 70+ collations for 30+ character sets.
mysql>
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | ISO 8859-1 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| cp1251 | Windows Cyrillic | cp1251_bulgarian_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| binary | Binary pseudo charset | binary | 1 |
+----------+-----------------------------+---------------------+--------+
33 rows in set (0.01 sec)
NB: ALL CHARACTER SETS HAVE A BINARY COLLATION. WE HAVE NOT INCLUDED
THE BINARY COLLATION IN ALL THE DESCRIPTIONS THAT FOLLOW.
The Unicode Character Sets
--------------------------
that will be
happening soon. Now they have default case-insensitive
accent-insensitive collations, plus the binary collation.
+---------+-----------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+-----------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
+---------+-----------------+-------------------+--------+
Platform Specific Character Sets
--------------------------------
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
+----------+-----------------------------+---------------------+--------+
Character Sets for South Europe and Middle East
-----------------------------------------------
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| geostd8 | Georgian | geostd8_general_ci | 1 |
+----------+-----------------------------+---------------------+--------+
The Asian Character Sets
------------------------
The Asian character sets that we support include Chinese, Japanese,
Korean, and Thai. These can be complicated. For example, the Chinese
sets have to allow for thousands of different characters.
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
+----------+-----------------------------+---------------------+--------+
The Baltic Character Sets
-------------------------
The Baltic character sets cover Estonian, Latvian, and Lithuanian
languages. There are two Baltic character sets currently supported:
+----------------------+----------+----+---------+----------+---------+
| latin7_estonian_cs | latin7 | 20 | | | 0 |
| latin7_general_ci | latin7 | 41 | Yes | | 0 |
| latin7_general_cs | latin7 | 42 | | | 0 |
| latin7_bin | latin7 | 79 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `cp1257' (Windows Baltic):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| cp1257_lithuanian_ci | cp1257 | 29 | | | 0 |
| cp1257_bin | cp1257 | 58 | | | 0 |
| cp1257_general_ci | cp1257 | 59 | Yes | | 0 |
+----------------------+----------+----+---------+----------+---------+
The Cyrillic Character Sets
---------------------------
Here are the Cyrillic character sets and collations for use with
Belarusian, Bulgarian, Russian, Ukrainian languages.
+----------------------+----------+----+---------+----------+---------+
| cp1251_bulgarian_ci | cp1251 | 14 | | | 0 |
| cp1251_ukrainian_ci | cp1251 | 23 | | | 0 |
| cp1251_bin | cp1251 | 50 | | | 0 |
| cp1251_general_ci | cp1251 | 51 | Yes | | 0 |
| cp1251_general_cs | cp1251 | 52 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
+----------------------+----------+----+---------+----------+---------+
| cp866_general_ci | cp866 | 36 | Yes | | 0 |
| cp866_bin | cp866 | 68 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `koi8r' (KOI8-R Relcom Russian, primarily used in Russia on Unix):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| koi8r_general_ci | koi8r | 7 | Yes | | 0 |
| koi8r_bin | koi8r | 74 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `koi8u' (KOI8-U Ukrainian, primarily used in Ukraine on Unix):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| koi8u_general_ci | koi8u | 22 | Yes | | 0 |
| koi8u_bin | koi8u | 75 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
The Central European Character Sets
-----------------------------------
We have some support for character sets used in The Czech Republic,
Slovakia, Hungary, Romania, Slovenia, Croatia, and Poland.
|
+----------------------+----------+----+---------+----------+---------+
| cp1250_general_ci | cp1250 | 26 | Yes | | 0 |
| cp1250_czech_ci | cp1250 | 34 | | Yes | 2 |
| cp1250_bin | cp1250 | 66 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `cp852' (DOS Central European):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| cp852_general_ci | cp852 | 40 | Yes | | 0 |
| cp852_bin | cp852 | 81 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `macce' (Mac Central European):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| macce_general_ci | macce | 38 | Yes | | 0 |
| macce_bin | macce | 43 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `latin2' (ISO 8859-2 Central European):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| latin2_czech_ci | latin2 | 2 | | Yes | 4 |
| latin2_general_ci | latin2 | 9 | Yes | | 0 |
| latin2_hungarian_ci | latin2 | 21 | | | 0 |
| latin2_croatian_ci | latin2 | 27 | | | 0 |
| latin2_bin | latin2 | 77 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `keybcs2' (DOS Kamenicky Czech-Slovak):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| keybcs2_general_ci | keybcs2 | 37 | Yes | | 0 |
| keybcs2_bin | keybcs2 | 73 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
The West European Character Sets
--------------------------------
Icelandic,
Irish, Scottish, and English.
* `latin1' (ISO 8859-1 West European):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | | 0 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 0 |
| latin1_danish_ci | latin1 | 15 | | | 0 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 0 |
| latin1_general_ci | latin1 | 48 | | | 0 |
| latin1_general_cs | latin1 | 49 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
The `latin1_swedish_ci' collation is the default that probably is
used by the majority of MySQL customers. It is constantly stated
that this is based on the Swedish/Finnish collation rules, but you
will find Swedes and Finns who disagree with that statement.
The `latin1_german1_ci' and `latin1_german2_ci' collations are
based on the DIN-1 and DIN-2 standards, where DIN stands for
Deutsches Institut Fu"r Normung (that is, the German answer to
ANSI). DIN-1 is called the dictionary collation and DIN-2 is
called the phone-book collation.
* `latin1_german1_ci' (dictionary) rules:
`A"' = `A', `O"' = `O', `U"' = `U', `ss' = `s'
* `latin1_german2_ci' (phone-book) rules:
`A"' = `AE', `O"' = `OE', `U"' = `UE', `ss' = `ss'
+----------------------+----------+----+---------+----------+---------+
| macroman_general_ci | macroman | 39 | Yes | | 0 |
| macroman_bin | macroman | 53 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
* `cp850' (DOS West European):
+----------------------+----------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------------+----------+----+---------+----------+---------+
| cp850_general_ci | cp850 | 4 | Yes | | 0 |
| cp850_bin | cp850 | 80 | | | 0 |
+----------------------+----------+----+---------+----------+---------+
[Назад] [Содержание] [Вперед]
| Главная |