MySQL Localisation and International Usage
==========================================
The Character Set Used for Data and Sorting
-------------------------------------------
By default, MySQL uses the ISO-8859-1 (Latin1) character set with
sorting according to Swedish/Finnish. This is the character set suitable
in the USA and western Europe.
binary. Other character sets will be loaded from a
character-set definition file when needed.
The character set determines what characters are allowed in names and
how things are sorted by the `ORDER BY' and `GROUP BY' clauses of the
`SELECT' statement.
list-of-charset | complex | all | none' options to `configure', and the
character set configuration files listed in `SHAREDIR/charsets/Index'.
*Note configure options::.
If you change the character set when running MySQL (which may also
change the sort order), you must run `myisamchk -r -q
--set-character-set=charset' on all tables. Otherwise, your indexes may
not be ordered correctly.
When a client connects to a MySQL server, the server sends the default
character set in use to the client. The client will switch to use this
character set for this connection.
One should use `mysql_real_escape_string()' when escaping strings for
an SQL query. `mysql_real_escape_string()' is identical to the old
`mysql_escape_string()' function, except that it takes the `MYSQL'
connection handle as the first parameter.
If the client is compiled with different paths than where the server is
installed and the user who configured MySQL didn't include all
character sets in the MySQL binary, one must specify for the client
where it can find the additional character sets it will need if the
server runs with a different character set than the client.
One can specify this by putting in a MySQL option file:
[client]
character-sets-dir=/usr/local/mysql/share/mysql/charsets
where the path points to the directory in which the dynamic MySQL
character sets are stored.
One can force the client to use specific character set by specifying:
[client]
default-character-set=character-set-name
but normally this is never needed.
German character set
....................
To get German sorting order, you should start `mysqld' with
`--default-character-set=latin1_de'. This will give you the following
characteristics.
When sorting and comparing strings, the following mapping is done on the
strings before doing the comparison:
a" -> ae
o" -> oe
u" -> ue
ss -> ss
All accented characters, are converted to their un-accented uppercase
counterpart. All letters are converted to uppercase.
When comparing strings with `LIKE' the one -> two character mapping is
not done. All letters are converted to uppercase. Accent are removed
from all letters except: `U"', `u"', `O"', `o"', `A"' and `a"'.
Non-English Error Messages
--------------------------
`mysqld' can issue error messages in the following languages: Czech,
Danish, Dutch, English (the default), Estonian, French, German, Greek,
Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish,
Portuguese, Romanian, Russian, Slovak, Spanish, and Swedish.
To start `mysqld' with a particular language, use either the
`--language=lang' or `-L lang' options. For example:
shell> mysqld --language=swedish
or:
shell> mysqld --language=/usr/local/share/swedish
Note that all language names are specified in lowercase.
The language files are located (by default) in
`MYSQL_BASE_DIR/share/LANGUAGE/'.
To update the error message file, you should edit the `errmsg.txt' file
and execute the following command to generate the `errmsg.sys' file:
shell> comp_err errmsg.txt errmsg.sys
If you upgrade to a newer version of MySQL, remember to repeat your
changes with the new `errmsg.txt' file.
Adding a New Character Set
--------------------------
To add another character set to MySQL, use the following procedure.
either of
those features, it is complex.
For example, `latin1' and `danish' are simple charactersets while
`big5' or `czech' are complex character sets.
In the following section, we have assumed that you name your character
set `MYSET'.
For a simple character set do the following:
1. Add MYSET to the end of the `sql/share/charsets/Index' file Assign
a unique number to it.
2. Create the file `sql/share/charsets/MYSET.conf'. (You can use
`sql/share/charsets/latin1.conf' as a base for this.)
The syntax for the file is very simple:
* Comments start with a '#' character and proceed to the end of
the line.
* Words are separated by arbitrary amounts of whitespace.
* When defining the character set, every word must be a number
in hexadecimal format
* The `ctype' array takes up the first 257 words. The
`to_lower[]', `to_upper[]' and `sort_order[]' arrays take up
256 words each after that.
*Note Character arrays::.
3. Add the character set name to the `CHARSETS_AVAILABLE' and
`COMPILED_CHARSETS' lists in `configure.in'.
4. Reconfigure, recompile, and test.
For a complex character set do the following:
1. Create the file `strings/ctype-MYSET.c' in the MySQL source
distribution.
2. Add MYSET to the end of the `sql/share/charsets/Index' file.
Assign a unique number to it.
`to_lower_MYSET', and so on. This corresponds to the arrays in
the simple character set. *Note Character arrays::.
4. Near the top of the file, place a special comment like this:
* .configure. strxfrm_multiply_MYSET=N
* .configure. mbmaxlen_MYSET=N
*/
The `configure' program uses this comment to include the character
set into the MySQL library automatically.
respectively.
5. You should then create some of the following functions:
* `my_strncoll_MYSET()'
* `my_strcoll_MYSET()'
* `my_strxfrm_MYSET()'
* `my_like_range_MYSET()'
*Note String collating::.
6. Add the character set name to the `CHARSETS_AVAILABLE' and
`COMPILED_CHARSETS' lists in `configure.in'.
7. Reconfigure, recompile, and test.
The file `sql/share/charsets/README' includes some more instructions.
If you want to have the character set included in the MySQL
distribution, mail a patch to the MySQL internals mailing list. *Note
Mailing-list::.
The Character Definition Arrays
-------------------------------
`to_lower[]' and `to_upper[]' are simple arrays that hold the lowercase
and uppercase characters corresponding to each member of the character
set. For example:
to_lower['A'] should contain 'a'
to_upper['a'] should contain 'A'
will be
case-insensitive). MySQL will sort characters based on the value of
`sort_order[character]'. For more complicated sorting rules, see the
discussion of string collating below. *Note String collating::.
`ctype[]' is an array of bit values, with one element for one character.
(Note that `to_lower[]', `to_upper[]', and `sort_order[]' are indexed
by character value, but `ctype[]' is indexed by character value + 1.
This is an old legacy to be able to handle `EOF'.)
You can find the following bitmask definitions in `m_ctype.h':
#define _U 01 /* Uppercase */
#define _L 02 /* Lowercase */
#define _N 04 /* Numeral (digit) */
#define _S 010 /* Spacing character */
#define _P 020 /* Punctuation */
#define _C 040 /* Control character */
#define _B 0100 /* Blank */
#define _X 0200 /* heXadecimal digit */
The `ctype[]' entry for each character should be the union of the
applicable bitmask values that describe the character. For example,
`'A'' is an uppercase character (`_U') as well as a hexadecimal digit
(`_X'), so `ctype['A'+1]' should contain the value:
_U + _X = 01 + 0200 = 0201
String Collating Support
------------------------
If the sorting rules for your language are too complex to be handled
with the simple `sort_order[]' table, you need to use the string
collating functions.
Right now the best documentation on this is the character sets that are
already implemented. Look at the `big5', `czech', `gbk', `sjis', and
`tis160' character sets for examples.
You must specify the `strxfrm_multiply_MYSET=N' value in the special
comment at the top of the file. `N' should be set to the maximum ratio
the strings may grow during `my_strxfrm_MYSET' (it must be a positive
integer).
Multi-byte Character Support
----------------------------
If your want to add support for a new character set that includes
multi-byte characters, you need to use the multi-byte character
functions.
Right now the best documentation on this is the character sets that are
already implemented. Look at the `euc_kr', `gb2312', `gbk', `sjis',
and `ujis' character sets for examples. These are implemented in the
`ctype-'charset'.c' files in the `strings' directory.
You must specify the `mbmaxlen_MYSET=N' value in the special comment at
the top of the source file. `N' should be set to the size in bytes of
the largest character in the set.
Problems With Character Sets
----------------------------
If you try to use a character set that is not compiled into your binary,
you can run into a couple of different problems:
* Your program has a wrong path to where the character sets are
stored. (Default `/usr/local/mysql/share/mysql/charsets'). This
can be fixed by using the `--character-sets-dir' option to the
program in question.
* The character set is a multi-byte character set that can't be
loaded dynamically. In this case you have to recompile the
program with the support for the character set.
* The character set is a dynamic character set, but you don't have a
configure file for it. In this case you should install the
configure file for the character set from a new MySQL distribution.
* Your `Index' file doesn't contain the name for the character set.
ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf' not found
(Errcode: 2)
In this case you should either get a new `Index' file or add by
hand the name of any missing character sets.
For `MyISAM' tables, you can check the character set name and number
for a table with `myisamchk -dvv table_name'.
[Назад] [Содержание] [Вперед]
| Главная |