Adding New Functions to MySQL
=============================
There are two ways to add new functions to MySQL:
* You can add the function through the user-definable function (UDF)
interface. User-definable functions are added and removed
dynamically using the `CREATE FUNCTION' and `DROP FUNCTION'
statements. *Note `CREATE FUNCTION': CREATE FUNCTION.
* You can add the function as a native (built in) MySQL function.
Native functions are compiled into the `mysqld' server and become
available on a permanent basis.
Each method has advantages and disadvantages:
* If you write a user-definable function, you must install the
object file in addition to the server itself. If you compile your
function into the server, you don't need to do that.
* You can add UDFs to a binary MySQL distribution. Native functions
require you to modify a source distribution.
* If you upgrade your MySQL distribution, you can continue to use
your previously installed UDFs. For native functions, you must
repeat your modifications each time you upgrade.
Whichever method you use to add new functions, they may be used just
like native functions such as `ABS()' or `SOUNDEX()'.
`CREATE FUNCTION/DROP FUNCTION' Syntax
--------------------------------------
CREATE [AGGREGATE] FUNCTION function_name RETURNS {STRING|REAL|INTEGER}
SONAME shared_library_name
DROP FUNCTION function_name
A user-definable function (UDF) is a way to extend MySQL with a new
function that works like native (built in) MySQL function such as
`ABS()' and `CONCAT()'.
`AGGREGATE' is a new option for MySQL Version 3.23. An `AGGREGATE'
function works exactly like a native MySQL `GROUP' function like `SUM'
or `COUNT()'.
`CREATE FUNCTION' saves the function's name, type, and shared library
name in the `mysql.func' system table. You must have the `INSERT' and
`DELETE' privileges for the `mysql' database to create and drop
functions.
All active functions are reloaded each time the server starts, unless
you start `mysqld' with the `--skip-grant-tables' option. In this
case, UDF initialisation is skipped and UDFs are unavailable. (An
active function is one that has been loaded with `CREATE FUNCTION' and
not removed with `DROP FUNCTION'.)
For instructions on writing user-definable functions, see *Note Adding
functions::. For the UDF mechanism to work, functions must be written
in C or C++, your operating system must support dynamic loading and you
must have compiled `mysqld' dynamically (not statically).
Note that to make `AGGREGATE' work, you must have a `mysql.func' table
that contains the column `type'. If you do not have this table, you
should run the script `mysql_fix_privilege_tables' to create it.
Adding a New User-definable Function
------------------------------------
5
new functions. Consult this file to see how UDF calling conventions
work.
(with
`dlopen()') from a static linked program, which you would get if you
are using `--with-mysqld-ldflags=-all-static' If you want to use an UDF
that needs to access symbols from `mysqld' (like the `metaphone'
example in `sql/udf_example.cc' that uses `default_charset_info'), you
must link the program with `-rdynamic' (see `man dlopen').
If you are using a precompiled version of the server, use MySQL-Max,
which supports dynamic loading.
between SQL and C/C++ usage, `XXX()' (uppercase) indicates an SQL
function call, and `xxx()' (lowercase) indicates a C/C++ function call.
The C/C++ functions that you write to implement the interface for
`XXX()' are:
`xxx()' (required)
The main function. This is where the function result is computed.
The correspondence between the SQL type and return type of your
C/C++ function is shown here:
*SQL *C/C++
type* type*
`STRING' `char *'
`INTEGER' `long
long'
`REAL' `double'
`xxx_init()' (optional)
The initialisation function for `xxx()'. It can be used to:
* Check the number of arguments to `XXX()'.
* Check that the arguments are of a required type or,
alternatively, tell MySQL to coerce arguments to the types
you want when the main function is called.
* Allocate any memory required by the main function.
* Specify the maximum length of the result.
* Specify (for `REAL' functions) the maximum number of decimals.
* Specify whether the result can be `NULL'.
`xxx_deinit()' (optional)
The deinitialisation function for `xxx()'. It should deallocate
any memory allocated by the initialisation function.
When an SQL statement invokes `XXX()', MySQL calls the initialisation
function `xxx_init()' to let it perform any required setup, such as
argument checking or memory allocation. If `xxx_init()' returns an
error, the SQL statement is aborted with an error message and the main
and deinitialisation functions are not called. Otherwise, the main
function `xxx()' is called once for each row. After all rows have been
processed, the deinitialisation function `xxx_deinit()' is called so it
can perform any required cleanup.
For aggregate functions (like `SUM()'), you must also provide the
following functions:
`xxx_reset()' (required)
Reset sum and insert the argument as the initial value for a new
group.
`xxx_add()' (required)
Add the argument to the old sum.
When using aggregate UDFs, MySQL works the following way:
1. Call `xxx_init()' to let the aggregate function allocate the
memory it will need to store results.
2. Sort the table according to the `GROUP BY' expression.
3. For the first row in a new group, call the `xxx_reset()' function.
4. For each new row that belongs in the same group, call the
`xxx_add()' function.
5. When the group changes or after the last row has been processed,
call `xxx()' to get the result for the aggregate.
6. Repeat 3-5 until all rows has been processed
7. Call `xxx_deinit()' to let the UDF free any memory it has
allocated.
All functions must be thread-safe (not just the main function, but the
initialisation and deinitialisation functions as well). This means that
you are not allowed to allocate any global or static variables that
change! If you need memory, you should allocate it in `xxx_init()' and
free it in `xxx_deinit()'.
UDF Calling Sequences for simple functions
..........................................
The main function should be declared as shown here. Note that the
return type and parameters differ, depending on whether you will
declare the SQL function `XXX()' to return `STRING', `INTEGER', or
`REAL' in the `CREATE FUNCTION' statement:
For `STRING' functions:
char *xxx(UDF_INIT *initid, UDF_ARGS *args,
char *result, unsigned long *length,
char *is_null, char *error);
For `INTEGER' functions:
long long xxx(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error);
For `REAL' functions:
double xxx(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error);
The initialisation and deinitialisation functions are declared like
this:
my_bool xxx_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
void xxx_deinit(UDF_INIT *initid);
The
initialisation function should fill in any members that it wishes to
change. (To use the default for a member, leave it unchanged.):
`my_bool maybe_null'
`xxx_init()' should set `maybe_null' to `1' if `xxx()' can return
`NULL'. The default value is `1' if any of the arguments are
declared `maybe_null'.
`1.34', `1.345', and `1.3', the
default would be 3, because `1.345' has 3 decimals.
`unsigned int max_length'
The maximum length of the string result. The default value
differs depending on the result type of the function. For string
functions, the default is the length of the longest argument. For
integer functions, the default is 21 digits. For real functions,
the default is 13 plus the number of decimals indicated by
`initid->decimals'. (For numeric functions, the length includes
any sign or decimal point characters.)
`char *ptr'
A pointer that the function can use for its own purposes. For
example, functions can use `initid->ptr' to communicate allocated
memory between functions. In `xxx_init()', allocate the memory
and assign it to this pointer:
initid->ptr = allocated_memory;
In `xxx()' and `xxx_deinit()', refer to `initid->ptr' to use or
deallocate the memory.
UDF Calling Sequences for aggregate functions
.............................................
Here follows a description of the different functions you need to define
when you want to create an aggregate UDF function.
Note that the following function is NOT needed or used by MySQL 4.1.1.
You can keep still have define his function if you want to have your
code work with both MySQL 4.0 and MySQL 4.1.1
char *xxx_reset(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error);
This function is called when MySQL finds the first row in a new group.
In the function you should reset any internal summary variables and
then set the given argument as the first argument in the group.
In many cases this is implemented internally by reseting all variables
(for example by calling `xxx_clear()' and then calling `xxx_add()'.
The following function is only required by MySQL 4.1.1 and above:
char *xxx_clear(UDF_INIT *initid, char *is_null, char *error);
This function is called when MySQL needs to reset the summary results.
This will be called at the beginning for each new group but can also be
called to reset the values for a query where there was no matching rows.
`is_null' will be set to point to `CHAR(0)' before calling
`xxx_clear()'.
You can use the `error' pointer to store a byte if something went wrong
.
char *xxx_add(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error);
This function is called for all rows that belongs to the same group,
except for the first row. In this you should add the value in UDF_ARGS
to your internal summary variable.
The `xxx()' function should be declared identical as when you define a
simple UDF function. *Note UDF calling::.
This function is called when all rows in the group has been processed.
You should normally never access the `args' variable here but return
your value based on your internal summary variables.
All argument processing in `xxx_reset()' and `xxx_add()' should be done
identically as for normal UDFs. *Note UDF arguments::.
The return value handling in `xxx()' should be done identically as for
a normal UDF. *Note UDF return values::.
The pointer argument to `is_null' and `error' is the same for all calls
to `xxx_reset()', `xxx_clear()', `xxx_add()' and `xxx()'. You can use
this to remember that you got an error or if the `xxx()' function
should return `NULL'. Note that you should not store a string into
`*error'! This is just a 1 byte flag!
`is_null' is reset for each group (before calling `xxx_clear()' `error'
is never reset.
If `isnull' or `error' are set after `xxx()' then MySQL will return
`NULL' as the result for the group function.
Argument Processing
...................
The `args' parameter points to a `UDF_ARGS' structure that has the
members listed here:
if (args->arg_count != 2)
{
strcpy(message,"XXX() requires two arguments");
return 1;
}
`enum Item_result *arg_type'
The types for each argument. The possible type values are
`STRING_RESULT', `INT_RESULT', and `REAL_RESULT'.
To make sure that arguments are of a given type and return an
error if they are not, check the `arg_type' array in the
initialisation function. For example:
if (args->arg_type[0] != STRING_RESULT ||
args->arg_type[1] != INT_RESULT)
{
strcpy(message,"XXX() requires a string and an integer");
return 1;
}
As an alternative to requiring your function's arguments to be of
particular types, you can use the initialisation function to set
the `arg_type' elements to the types you want. This causes MySQL
to coerce arguments to those types for each call to `xxx()'. For
example, to specify coercion of the first two arguments to string
and integer, do this in `xxx_init()':
args->arg_type[0] = STRING_RESULT;
args->arg_type[1] = INT_RESULT;
`char **args'
`args->args' communicates information to the initialisation
function about the general nature of the arguments your function
was called with. For a constant argument `i', `args->args[i]'
points to the argument value. (See below for instructions on how
to access the value properly.) For a non-constant argument,
`args->args[i]' is `0'. A constant argument is an expression that
uses only constants, such as `3' or `4*7-2' or `SIN(3.14)'. A
non-constant argument is an expression that refers to values that
may change from row to row, such as column names or functions that
are called with non-constant arguments.
For each invocation of the main function, `args->args' contains the
actual arguments that are passed for the row currently being
processed.
Functions can refer to an argument `i' as follows:
* An argument of type `STRING_RESULT' is given as a string
pointer plus a length, to allow handling of binary data or
data of arbitrary length. The string contents are available
as `args->args[i]' and the string length is
`args->lengths[i]'. You should not assume that strings are
null-terminated.
* For an argument of type `INT_RESULT', you must cast
`args->args[i]' to a `long long' value:
long long int_val;
int_val = *((long long*) args->args[i]);
* For an argument of type `REAL_RESULT', you must cast
`args->args[i]' to a `double' value:
double real_val;
real_val = *((double*) args->args[i]);
of the main function, `lengths'
contains the actual lengths of any string arguments that are
passed for the row currently being processed. For arguments of
types `INT_RESULT' or `REAL_RESULT', `lengths' still contains the
maximum length of the argument (as for the initialisation
function).
Return Values and Error Handling
................................
will be returned to the client. The message buffer is
`MYSQL_ERRMSG_SIZE' characters long, but you should try to keep the
message to less than 80 characters so that it fits the width of a
standard terminal screen.
The return value of the main function `xxx()' is the function value, for
`long long' and `double' functions. A string functions should return a
pointer to the result and store the length of the string in the
`length' arguments.
Set these to the contents and length of the return value. For example:
memcpy(result, "result string", 13);
*length = 13;
The `result' buffer that is passed to the calc function is 255 byte
big. If your result fits in this, you don't have to worry about memory
allocation for results.
If your string function needs to return a string longer than 255 bytes,
you must allocate the space for it with `malloc()' in your `xxx_init()'
function or your `xxx()' function and free it in your `xxx_deinit()'
function. You can store the allocated memory in the `ptr' slot in the
`UDF_INIT' structure for reuse by future `xxx()' calls. *Note UDF
calling::.
To indicate a return value of `NULL' in the main function, set
`is_null' to `1':
*is_null = 1;
To indicate an error return in the main function, set the `error'
parameter to `1':
*error = 1;
If `xxx()' sets `*error' to `1' for any row, the function value is
`NULL' for the current row and for any subsequent rows processed by the
statement in which `XXX()' was invoked. (`xxx()' will not even be
called for subsequent rows.) *Note*: in MySQL versions prior to
3.22.10, you should set both `*error' and `*is_null':
*error = 1;
*is_null = 1;
Compiling and Installing User-definable Functions
.................................................
Files implementing UDFs must be compiled and installed on the host
where the server runs. This process is described below for the example
UDF file `udf_example.cc' that is included in the MySQL source
distribution. This file contains the following functions:
* `metaphon()' returns a metaphon string of the string argument.
This is something like a soundex string, but it's more tuned for
English.
* `myfunc_double()' returns the sum of the ASCII values of the
characters in its arguments, divided by the sum of the length of
its arguments.
* `myfunc_int()' returns the sum of the length of its arguments.
* `sequence([const int])' returns an sequence starting from the given
number or 1 if no number has been given.
* `lookup()' returns the IP number for a hostname.
* `reverse_lookup()' returns the hostname for an IP number. The
function may be called with a string `"xxx.xxx.xxx.xxx"' or four
numbers.
A dynamically loadable file should be compiled as a sharable object
file, using a command something like this:
shell> gcc -shared -o udf_example.so myfunc.cc
You can easily find out the correct compiler options for your system by
running this command in the `sql' directory of your MySQL source tree:
shell> make udf_example.o
some
systems, you may need to leave the `-c' on the command.)
exact name
may vary from platform to platform). Copy this file to some directory
searched by the dynamic linker `ld', such as `/usr/lib' or add the
directory in which you placed the shared object to the linker
configuration file (for example, `/etc/ld.so.conf').
variable you
should use on your system. You should set this in `mysql.server' or
`mysqld_safe' startup scripts and restart `mysqld'.
After the library is installed, notify `mysqld' about the new functions
with these commands:
mysql> CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so";
mysql> CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.so";
mysql> CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.so";
mysql> CREATE FUNCTION lookup RETURNS STRING SONAME "udf_example.so";
mysql> CREATE FUNCTION reverse_lookup
-> RETURNS STRING SONAME "udf_example.so";
mysql> CREATE AGGREGATE FUNCTION avgcost
-> RETURNS REAL SONAME "udf_example.so";
Functions can be deleted using `DROP FUNCTION':
mysql> DROP FUNCTION metaphon;
mysql> DROP FUNCTION myfunc_double;
mysql> DROP FUNCTION myfunc_int;
mysql> DROP FUNCTION lookup;
mysql> DROP FUNCTION reverse_lookup;
mysql> DROP FUNCTION avgcost;
the `INSERT'
and `DELETE' privileges for the `mysql' database to create and drop
functions.
You should not use `CREATE FUNCTION' to add a function that has already
been created. If you need to reinstall a function, you should remove
it with `DROP FUNCTION' and then reinstall it with `CREATE FUNCTION'.
You would need to do this, for example, if you recompile a new version
of your function, so that `mysqld' gets the new version. Otherwise,
the server will continue to use the old version.
Active functions are reloaded each time the server starts, unless you
start `mysqld' with the `--skip-grant-tables' option. In this case, UDF
initialisation is skipped and UDFs are unavailable. (An active
function is one that has been loaded with `CREATE FUNCTION' and not
removed with `DROP FUNCTION'.)
Adding a New Native Function
----------------------------
The procedure for adding a new native function is described here. Note
that you cannot add native functions to a binary distribution because
the procedure involves modifying MySQL source code. You must compile
MySQL yourself from a source distribution. Also note that if you
migrate to another version of MySQL (for example, when a new version is
released), you will need to repeat the procedure with the new version.
To add a new native MySQL function, follow these steps:
1. Add one line to `lex.h' that defines the function name in the
`sql_functions[]' array.
2. If the function prototype is simple (just takes zero, one, two or
three arguments), you should in lex.h specify SYM(FUNC_ARG#)
(where # is the number of arguments) as the second argument in the
`sql_functions[]' array and add a function that creates a function
object in `item_create.cc'. Take a look at `"ABS"' and
`create_funcs_abs()' for an example of this.
If the function prototype is complicated (for example takes a
variable number of arguments), you should add two lines to
`sql_yacc.yy'. One indicates the preprocessor symbol that `yacc'
should define (this should be added at the beginning of the file).
Then define the function parameters and add an "item" with these
parameters to the `simple_expr' parsing rule. For an example,
check all occurrences of `ATAN' in `sql_yacc.yy' to see how this
is done.
3. In `item_func.h', declare a class inheriting from `Item_num_func'
or `Item_str_func', depending on whether your function returns a
number or a string.
4. In `item_func.cc', add one of the following declarations, depending
on whether you are defining a numeric or string function:
double Item_func_newname::val()
longlong Item_func_newname::val_int()
String *Item_func_newname::Str(String *str)
If you inherit your object from any of the standard items (like
`Item_num_func'), you probably only have to define one of the above
functions and let the parent object take care of the other
functions. For example, the `Item_str_func' class defines a
`val()' function that executes `atof()' on the value returned by
`::str()'.
arguments. `max_length' is the maximum number of characters
the function may return. This function should also set `maybe_null
= 0' if the main function can't return a `NULL' value. The
function can check if any of the function arguments can return
`NULL' by checking the arguments `maybe_null' variable. You can
take a look at `Item_func_mod::fix_length_and_dec' for a typical
example of how to do this.
All functions must be thread-safe (in other words, don't use any global
or static variables in the functions without protecting them with
mutexes).
If you want to return `NULL', from `::val()', `::val_int()' or
`::str()' you should set `null_value' to 1 and return 0.
For `::str()' object functions, there are some additional
considerations to be aware of:
* The `::str()' function should return the string that holds the
result or `(char*) 0' if the result is `NULL'.
* All current string functions try to avoid allocating any memory
unless absolutely necessary!
[Назад] [Содержание] [Вперед]
| Главная |