C++ CSS HTML Java JavaScript MySQL Oracle PERL PHP SQL Unix VBScript XHTML XML Сети
Optimising `SELECT's and Other Queries (MySQL 4.0)
 
Optimising `SELECT's and Other Queries
======================================

First, one thing that affects all queries: The more complex permission
system setup you have, the more overhead you get.

 check
results in a larger overhead.

If your problem is with some explicit MySQL function, you can always
time this in the MySQL client:

     mysql> SELECT BENCHMARK(1000000,1+1);
     +------------------------+
     | BENCHMARK(1000000,1+1) |
     +------------------------+
     |                      0 |
     +------------------------+
     1 row in set (0.32 sec)

The above shows that MySQL can execute 1,000,000 `+' expressions in
0.32 seconds on a `PentiumII 400MHz'.

All MySQL functions should be very optimised, but there may be some
exceptions, and the `BENCHMARK(loop_count,expression)' is a great tool
to find out if this is a problem with your query.

`EXPLAIN' Syntax (Get Information About a `SELECT')
---------------------------------------------------

         EXPLAIN tbl_name
     or  EXPLAIN SELECT select_options

`EXPLAIN tbl_name' is a synonym for `DESCRIBE tbl_name' or `SHOW
COLUMNS FROM tbl_name'.

When you precede a `SELECT' statement with the keyword `EXPLAIN', MySQL
explains how it would process the `SELECT', providing information about
how tables are joined and in which order.

With the help of `EXPLAIN', you can see when you must add indexes to
tables to get a faster `SELECT' that uses indexes to find the records.

You should frequently run `ANALYZE TABLE' to update table statistics
such as cardinality of keys which can affect the choices the optimiser
makes. *Note ANALYZE TABLE::.

You can also see if the optimiser joins the tables in an optimal order.
To force the optimiser to use a specific join order for a `SELECT'
statement, add a `STRAIGHT_JOIN' clause.

For non-simple joins, `EXPLAIN' returns a row of information for each
table used in the `SELECT' statement. The tables are listed in the order
they would be read.  MySQL resolves all joins using a single-sweep
multi-join method. This means that MySQL reads a row from the first
table, then finds a matching row in the second table, then in the third
table and so on. When all tables are processed, it outputs the selected
columns and backtracks through the table list until a table is found
for which there are more matching rows. The next row is read from this
table and the process continues with the next table.

In MySQL version 4.1 the `EXPLAIN' output was changed to work better
with constructs like `UNION's, subqueries and derived tables. Most
notable is the addition of two new columns: `id' and `select_type'.

Output from `EXPLAIN' consists of the following columns:

`id'
     `SELECT' identifier, the sequential number of this `SELECT' within
     the query.

`select_type'
     Type of `SELECT' clause, which can be any of the following:

    `SIMPLE'
          Simple `SELECT' (without `UNION's or subqueries).

    `PRIMARY'
          Outermost `SELECT'.

    `UNION'
          Second and further `UNION' `SELECT's.

    `DEPENDENT UNION'
          Second and further `UNION' `SELECTS's, dependent on outer
          subquery.

    `SUBQUERY'
          First `SELECT' in subquery.

    `DEPENDENT SUBQUERY'
          First `SELECT', dependent on outer subquery.

    `DERIVED'
          Derived table `SELECT'.

`table'
     The table to which the row of output refers.

`type'
     The join type. The different join types are listed here, ordered
     from best to worst type:

    `system'
          The table has only one row (= system table). This is a
          special case of the `const' join type.

     as constants by
          the rest of the optimiser. `const' tables are very fast as
          they are read only once!

          `const' is used when you compare all parts of a
          `PRIMARY'/`UNIQUE' key with constants:

               SELECT * FROM const_table WHERE primary_key=1;
               
               SELECT * FROM const_table WHERE primary_key_part1=1 and primary_key_part2=2;

    `eq_ref'
          One row will be read from this table for each combination of
          rows from the previous tables.  This is the best possible
          join type, other than the `const' types.  It is used when all
          parts of an index are used by the join and the index is
          `UNIQUE' or a `PRIMARY KEY'.

          `eq_ref' can be used for indexed columns that is compared with
          `='.  The compared item may be a constant or an expression
          that uses columns from tables that are read before this table.

          In the following examples, `ref_table' will be able to use
          `eq_ref'

               SELECT * FROM ref_table,other_table WHERE
               ref_table.key_column=other_table.column;
               
               SELECT * FROM ref_table,other_table WHERE
               ref_table.key_column_part1=other_table.column AND
               ref_table.key_column_part2=1;

    `ref'
          All rows with matching index values will be read from this
          table for each combination of rows from the previous tables.
          `ref' is used if the join uses only a leftmost prefix of the
          key, or if the key is not `UNIQUE' or a `PRIMARY KEY' (in
          other words, if the join cannot select a single row based on
          the key value).  If the key that is used matches only a few
          rows, this join type is good.

          `ref' can be used for indexed columns that is compared with
          `='.

          In the following examples, `ref_table' will be able to use
          `ref'

               SELECT * FROM ref_table WHERE key_column=expr;
               
               SELECT * FROM ref_table,other_table WHERE
               ref_table.key_column=other_table.column;
               
               SELECT * FROM ref_table,other_table WHERE
               ref_table.key_column_part1=other_table.column AND
               ref_table.key_column_part2=1;

    `ref_or_null'
          Like `ref', but with the addition that we will do an extra
          search for rows with `NULL'. *Note IS NULL optimisation::.

               SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL;

          This optimisation is new for MySQL 4.1.1 and is mostly used
          when resolving sub queries.

     the longest key
          part that was used.  The `ref' column will be `NULL' for this
          type.

          `range' can be used for when an key column is compared to a
          constant with `=', `<>', `>', `>=', `<', `<=', `IS NULL',
          `<=>', `BETWEEN' and `IN'.

               SELECT * FROM range_table WHERE key_column = 10;
               
               SELECT * FROM range_table WHERE key_column BETWEEN 10 and 20;
               
               SELECT * FROM range_table WHERE key_column IN (10,20,30);
               
               SELECT * FROM range_table WHERE key_part1= 10 and key_part2 IN (10,20,30);

    `index'
          This is the same as `ALL', except that only the index tree is
          scanned.  This is usually faster than `ALL', as the index
          file is usually smaller than the datafile.

          This can be used when the query only uses columns that are
          part of one index.

    `ALL'
          A full table scan will be done for each combination of rows
          from the previous tables.  This is normally not good if the
          table is the first table not marked `const', and usually
          *very* bad in all other cases. You normally can avoid `ALL'
          by adding more indexes, so that the row can be retrieved
          based on constant values or column values from earlier tables.

`possible_keys'
     The `possible_keys' column indicates which indexes MySQL could use
     to find the rows in this table. Note that this column is totally
     independent of the order of the tables. That means that some of
     the keys in `possible_keys' may not be usable in practice with the
     generated table order.

     If this column is empty, there are no relevant indexes. In this
     case, you may be able to improve the performance of your query by
     examining the `WHERE' clause to see if it refers to some column or
     columns that would be suitable for indexing.  If so, create an
     appropriate index and check the query with `EXPLAIN' again. *Note
     ALTER TABLE::.

     To see what indexes a table has, use `SHOW INDEX FROM tbl_name'.

`key'
     The `key' column indicates the key (index) that MySQL actually
     decided to use. The key is `NULL' if no index was chosen. To force
     MySQL to use an key listed in the `possible_keys' column, use `USE
     KEY/IGNORE KEY' in your query.  *Note SELECT::.

     Also, running `myisamchk --analyze' (*note `myismchk' syntax:
     myisamchk syntax.) or `ANALYZE TABLE' (*note `ANALYZE TABLE':
     ANALYZE TABLE.) on the table will help the optimiser choose better
     indexes.

`key_len'
     The `key_len' column indicates the length of the key that MySQL
     decided to use.  The length is `NULL' if the `key' is `NULL'. Note
     that this tells us how many parts of a multi-part key MySQL will
     actually use.

`ref'
     The `ref' column shows which columns or constants are used with the
     `key' to select rows from the table.

`rows'
     The `rows' column indicates the number of rows MySQL believes it
     must examine to execute the query.

 `Distinct'
          MySQL will not continue searching for more rows for the
          current row combination after it has found the first matching
          row.

    `Not exists'
          MySQL was able to do a `LEFT JOIN' optimisation on the query
          and will not examine more rows in this table for the previous
          row combination after it finds one row that matches the `LEFT
          JOIN' criteria.

          Here is an example for this:

               SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

          Assume that `t2.id' is defined with `NOT NULL'.  In this case
          MySQL will scan `t1' and look up the rows in `t2' through
          `t1.id'. If MySQL finds a matching row in `t2', it knows that
          `t2.id' can never be `NULL', and will not scan through the
          rest of the rows in `t2' that has the same `id'.  In other
          words, for each row in `t1', MySQL only needs to do a single
          lookup in `t2', independent of how many matching rows there
          are in `t2'.

    ``range checked for each record (index map: #)''
          MySQL didn't find a real good index to use. It will, instead,
          for each row combination in the preceding tables, do a check
          on which index to use (if any), and use this index to
          retrieve the rows from the table.  This isn't very fast but
          is faster than having to do a join without an index.

    `Using filesort'
          MySQL will need to do an extra pass to find out how to
          retrieve the rows in sorted order.  The sort is done by going
          through all rows according to the `join type' and storing the
          sort key + pointer to the row for all rows that match the
          `WHERE'. Then the keys are sorted. Finally the rows are
          retrieved in sorted order.

      This can be done
          when all the used columns for the table are part of the same
          index.

    `Using temporary'
          To resolve the query MySQL will need to create a temporary
          table to hold the result.  This typically happens if you do an
          `ORDER BY' on a different column set than you did a `GROUP
          BY' on.

     the table is of type `ALL' or
          `index', you may have something wrong in your query (if you
          don't intend to fetch/examine all rows from the table).

     If you want to get your queries as fast as possible, you should
     look out for `Using filesort' and `Using temporary'.

You can get a good indication of how good a join is by multiplying all
values in the `rows' column of the `EXPLAIN' output. This should tell
you roughly how many rows MySQL must examine to execute the query. This
number is also used when you restrict queries with the `max_join_size'
variable.  *Note Server parameters::.

The following example shows how a `JOIN' can be optimised progressively
using the information provided by `EXPLAIN'.

Suppose you have the `SELECT' statement shown here, that you examine
using `EXPLAIN':

     EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
                 tt.ProjectReference, tt.EstimatedShipDate,
                 tt.ActualShipDate, tt.ClientID,
                 tt.ServiceCodes, tt.RepetitiveID,
                 tt.CurrentProcess, tt.CurrentDPPerson,
                 tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
                 et_1.COUNTRY, do.CUSTNAME
             FROM tt, et, et AS et_1, do
             WHERE tt.SubmitTime IS NULL
                 AND tt.ActualPC = et.EMPLOYID
                 AND tt.AssignedPC = et_1.EMPLOYID
                 AND tt.ClientID = do.CUSTNMBR;

For this example, assume that:

   * The columns being compared have been declared as follows:

     *Table* *Column*   *Column
                        type*
     `tt'    `ActualPC' `CHAR(10)'
     `tt'    `AssignedPC'`CHAR(10)'
     `tt'    `ClientID' `CHAR(10)'
     `et'    `EMPLOYID' `CHAR(15)'
     `do'    `CUSTNMBR' `CHAR(15)'

   * The tables have the indexes shown here:

     
   * The `tt.ActualPC' values aren't evenly distributed.

Initially, before any optimisations have been performed, the `EXPLAIN'
statement produces the following information:

     table type possible_keys                key  key_len ref  rows  Extra
     et    ALL  PRIMARY                      NULL NULL    NULL 74
     do    ALL  PRIMARY                      NULL NULL    NULL 2135
     et_1  ALL  PRIMARY                      NULL NULL    NULL 74
     tt    ALL  AssignedPC,ClientID,ActualPC NULL NULL    NULL 3872
           range checked for each record (key map: 35)

Because `type' is `ALL' for each table, this output indicates that
MySQL is generating a Cartesian product of all the tables!  This will
take quite a long time, as the product of the number of rows in each
table must be examined!  For the case at hand, this is `74 * 2135 * 74
* 3872 = 45,268,558,720' rows.  If the tables were bigger, you can only
imagine how long it would take.

One problem here is that MySQL can't (yet) use indexes on columns
efficiently if they are declared differently.  In this context,
`VARCHAR' and `CHAR' are the same unless they are declared as different
lengths. Because `tt.ActualPC' is declared as `CHAR(10)' and
`et.EMPLOYID' is declared as `CHAR(15)', there is a length mismatch.

To fix this disparity between column lengths, use `ALTER TABLE' to
lengthen `ActualPC' from 10 characters to 15 characters:

     mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);

Now `tt.ActualPC' and `et.EMPLOYID' are both `VARCHAR(15)'.  Executing
the `EXPLAIN' statement again produces this result:

     table type   possible_keys   key     key_len ref         rows    Extra
     tt    ALL    AssignedPC,ClientID,ActualPC NULL NULL NULL 3872    Using where
     do    ALL    PRIMARY         NULL    NULL    NULL        2135
           range checked for each record (key map: 1)
     et_1  ALL    PRIMARY         NULL    NULL    NULL        74
           range checked for each record (key map: 1)
     et    eq_ref PRIMARY         PRIMARY 15      tt.ActualPC 1

This is not perfect, but is much better (the product of the `rows'
values is now less by a factor of 74). This version is executed in a
couple of seconds.

A second alteration can be made to eliminate the column length
mismatches for the `tt.AssignedPC = et_1.EMPLOYID' and `tt.ClientID =
do.CUSTNMBR' comparisons:

     mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15),
         ->                MODIFY ClientID   VARCHAR(15);

Now `EXPLAIN' produces the output shown here:

     table type   possible_keys   key      key_len ref           rows Extra
     et    ALL    PRIMARY         NULL     NULL    NULL          74
     tt    ref    AssignedPC,     ActualPC 15      et.EMPLOYID   52   Using where
                  ClientID,
                  ActualPC
     et_1  eq_ref PRIMARY         PRIMARY  15      tt.AssignedPC 1
     do    eq_ref PRIMARY         PRIMARY  15      tt.ClientID   1

This is almost as good as it can get.

The remaining problem is that, by default, MySQL assumes that values in
the `tt.ActualPC' column are evenly distributed, and that isn't the
case for the `tt' table.  Fortunately, it is easy to tell MySQL about
this:

     shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt
     shell> mysqladmin refresh

Now the join is perfect, and `EXPLAIN' produces this result:

     table type   possible_keys key     key_len ref           rows Extra
     tt    ALL    AssignedPC    NULL    NULL    NULL          3872 Using where
                  ClientID,
                  ActualPC
     et    eq_ref PRIMARY       PRIMARY 15      tt.ActualPC   1
     et_1  eq_ref PRIMARY       PRIMARY 15      tt.AssignedPC 1
     do    eq_ref PRIMARY       PRIMARY 15      tt.ClientID   1

Note that the `rows' column in the output from `EXPLAIN' is an educated
guess from the MySQL join optimiser. To optimise a query, you should
check if the numbers are even close to the truth.  If not, you may get
better performance by using `STRAIGHT_JOIN' in your `SELECT' statement
and trying to list the tables in a different order in the `FROM' clause.

Estimating Query Performance
----------------------------

In most cases you can estimate the performance by counting disk seeks.
For small tables, you can usually find the row in 1 disk seek (as the
index is probably cached).  For bigger tables, you can estimate that
(using B++ tree indexes) you will need: `log(row_count) /
log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) +
1' seeks to find a row.

In MySQL an index block is usually 1024 bytes and the data pointer is
usually 4 bytes. A 500,000 row table with an index length of 3 (medium
integer) gives you: `log(500,000)/log(1024/3*2/(3+4)) + 1' = 4 seeks.

As the above index would require about 500,000 * 7 * 3/2 = 5.2M,
(assuming that the index buffers are filled to 2/3, which is typical)
you will probably have much of the index in memory and you will probably
only need 1-2 calls to read data from the OS to find the row.

For writes, however, you will need 4 seek requests (as above) to find
where to place the new index and normally 2 seeks to update the index
and write the row.

 gets
bigger. After the data gets too big to be cached, things will start to
go much slower until your applications is only bound by disk-seeks
(which increase by log N). To avoid this, increase the index cache as
the data grows. *Note Server parameters::.

Speed of `SELECT' Queries
-------------------------

In general, when you want to make a slow `SELECT ... WHERE' faster, the
first thing to check is whether you can add an index. *Note MySQL
indexes: MySQL indexes. All references between different tables should
usually be done with indexes. You can use the `EXPLAIN' command to
determine which indexes are used for a `SELECT'.  *Note `EXPLAIN':
EXPLAIN.

Some general tips:

   * To help MySQL optimise queries better, run `myisamchk --analyze'
     on a table after it has been loaded with relevant data. This
     updates a value for each index part that indicates the average
     number of rows that have the same value.  (For unique indexes,
     this is always 1, of course.)  MySQL will use this to decide which
     index to choose when you connect two tables with 'a non-constant
     expression'.  You can check the result from the `analyze' run by
     doing `SHOW INDEX FROM table_name' and examining the `Cardinality'
     column.

   * To sort an index and data according to an index, use `myisamchk
     --sort-index --sort-records=1' (if you want to sort on index 1).
     If you have a unique index from which you want to read all records
     in order according to that index, this is a good way to make that
     faster.  Note, however, that this sorting isn't written optimally
     and will take a long time for a large table!

How MySQL Optimises `WHERE' Clauses
-----------------------------------

The `WHERE' optimisations are put in the `SELECT' part here because
they are mostly used with `SELECT', but the same optimisations apply for
`WHERE' in `DELETE' and `UPDATE' statements.

Also note that this section is incomplete. MySQL does many
optimisations, and we have not had time to document them all.

Some of the optimisations performed by MySQL are listed here:

   * Removal of unnecessary parentheses:
             ((a AND b) AND c OR (((a AND b) AND (c AND d))))
          -> (a AND b AND c) OR (a AND b AND c AND d)

   * Constant folding:
             (a<b AND b=c) AND a=5
          -> b>5 AND b=c AND a=5

   * Constant condition removal (needed because of constant folding):
             (B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
          -> B=5 OR B=6

   * Constant expressions used by indexes are evaluated only once.

   * `COUNT(*)' on a single table without a `WHERE' is retrieved
     directly from the table information for `MyISAM' and `HEAP' tables.
     This is also done for any `NOT NULL' expression when used with
     only one table.

   * Early detection of invalid constant expressions. MySQL quickly
     detects that some `SELECT' statements are impossible and returns
     no rows.

   * `HAVING' is merged with `WHERE' if you don't use `GROUP BY' or
     group functions (`COUNT()', `MIN()'...).

   * For each sub-join, a simpler `WHERE' is constructed to get a fast
     `WHERE' evaluation for each sub-join and also to skip records as
     soon as possible.

   * All constant tables are read first, before any other tables in the
     query.  A constant table is:
        - An empty table or a table with 1 row.

        - A table that is used with a `WHERE' clause on a `UNIQUE'
          index, or a `PRIMARY KEY', where all index parts are used
          with constant expressions and the index parts are defined as
          `NOT NULL'.
     All the following tables are used as constant tables:
          mysql> SELECT * FROM t WHERE primary_key=1;
          mysql> SELECT * FROM t1,t2
              ->          WHERE t1.primary_key=1 AND t2.primary_key=t1.id;

    when
     joining.

   
     table is created.

   * If you use `SQL_SMALL_RESULT', MySQL will use an in-memory
     temporary table.

   * Each table index is queried, and the best index that spans fewer
     than 30% of the rows is used. If no such index can be found, a
     quick table scan is used.

   

   * Before each record is output, those that do not match the `HAVING'
     clause are skipped.

Some examples of queries that are very fast:

     mysql> SELECT COUNT(*) FROM tbl_name;
     mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;
     mysql> SELECT MAX(key_part2) FROM tbl_name
         ->        WHERE key_part_1=constant;
     mysql> SELECT ... FROM tbl_name
         ->        ORDER BY key_part1,key_part2,... LIMIT 10;
     mysql> SELECT ... FROM tbl_name
         ->        ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10;

The following queries are resolved using only the index tree (assuming
the indexed columns are numeric):

     mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;
     mysql> SELECT COUNT(*) FROM tbl_name
         ->        WHERE key_part1=val1 AND key_part2=val2;
     mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1;

The following queries use indexing to retrieve the rows in sorted order
without a separate sorting pass:

     mysql> SELECT ... FROM tbl_name
         ->            ORDER BY key_part1,key_part2,... ;
     mysql> SELECT ... FROM tbl_name
         ->            ORDER BY key_part1 DESC,key_part2 DESC,... ;

How MySQL Optimises `IS NULL'
-----------------------------

MySQL can do the same optimisation on `column IS NULL' as it can do
with `column = constant_value'.  For example, MySQL can use indexes and
ranges to search for `NULL' with `IS NULL'.

     SELECT * FROM table_name WHERE key_col IS NULL;
     
     SELECT * FROM table_name WHERE key_col <=> NULL;
     
     SELECT * FROM table_name WHERE key_col=# OR key_col=# OR key_col IS NULL

If you use `column_name IS NULL' on a `NOT NULL' in a WHERE clause on
table that is not used `OUTER JOIN' that expression will be optimised
away.



This optimisation can handle one `IS NULL' for any key part.

Some examples of queries that are optimised (assuming key on t2 (a,b)):

     SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;
     
     SELECT * FROM t1,t2 WHERE t1.a=t2.a OR t2.a IS NULL;
     
     SELECT * FROM t1,t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;
     
     SELECT * FROM t1,t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);
     
     SELECT * FROM t1,t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...);

`ref_or_null' works by first doing a read on the reference key and
after that a separate search after rows with NULL key.

Note that the optimisation can only handle one `IS NULL' level.

     SELECT * FROM t1,t2 where (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL);

Int the above case MySQL will only use key lookups on the part
`(t1.a=t2.a AND t2.a IS NULL)' and not be able to use the key part on
`b'.

How MySQL Optimises `DISTINCT'
------------------------------

`DISTINCT' combined with `ORDER BY' will in many cases need a temporary
table.



When combining `LIMIT row_count' with `DISTINCT', MySQL will stop as
soon as it finds `row_count' unique rows.

If you don't use columns from all used tables, MySQL will stop the
scanning of the not used tables as soon as it has found the first match.

     SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a;

In the case, assuming `t1' is used before `t2' (check with `EXPLAIN'),
then MySQL will stop reading from `t2' (for that particular row in
`t1') when the first row in `t2' is found.

How MySQL Optimises `LEFT JOIN' and `RIGHT JOIN'
------------------------------------------------

`A LEFT JOIN B join_condition' in MySQL is implemented as follows:

   * The table `B' is set to be dependent on table `A' and all tables
     that `A' is dependent on.

   * The table `A' is set to be dependent on all tables (except `B')
     that are used in the `LEFT JOIN' condition.

   * The `LEFT JOIN' condition is used to decide how we should retrieve
     rows from table B. (In other words, any condition in the `WHERE'
     clause is not used).

   

   * All standard `WHERE' optimisations are done.

   

   * If you use `LEFT JOIN' to find rows that don't exist in some table
     and you have the following test: `column_name IS NULL' in the
     `WHERE' part, where column_name is a column that is declared as
     `NOT NULL', then MySQL will stop searching after more rows (for a
     particular key combination) after it has found one row that
     matches the `LEFT JOIN' condition.

`RIGHT JOIN' is implemented analogously as `LEFT JOIN'.

 table
permutations to check.

Note that the above means that if you do a query of type:

     SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
              WHERE b.key=d.key

MySQL will do a full scan on `b' as the `LEFT JOIN' will force it to be
read before `d'.

The fix in this case is to change the query to:

     SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
              WHERE b.key=d.key

Starting from 4.0.14 MySQL does the following left join optimisation:

If the `WHERE' condition is always be false for the generated `NULL'
row, the `LEFT JOIN' is changed to a normal join.

For example, in the following query the `WHERE' clause would be false
if t2.column would be `NULL' so it's safe to convert to a normal join.

     SELECT * FROM t1 LEFT t2 ON (column) WHERE t2.column2 =5;
     ->
     SELECT * FROM t1,t2 WHERE t2.column2=5 AND t1.column=t2.column;

this can be made faster as MySQL can now use table `t2' before table
`t1' if this would result in a better query plan.  To force a specific
table order one should use `STRAIGHT JOIN'.

How MySQL Optimises `ORDER BY'
------------------------------

In some cases MySQL can uses index to satisfy an `ORDER BY' or `GROUP
BY' request without doing any extra sorting.

 The
following queries will use the index to resolve the `ORDER BY' / `GROUP
BY' part:

      t1 ORDER BY key_part1 DESC,key_part2 DESC
     SELECT * FROM t1 WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC

Some cases where MySQL can *not* use indexes to resolve the `ORDER BY':
(Note that MySQL will still use indexes to find the rows that matches
the `WHERE' clause):

   * You are doing an `ORDER BY' on different keys:

     `SELECT * FROM t1 ORDER BY key1,key2'

   * You are doing an `ORDER BY' using non-consecutive key parts.

     `SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2'

   * You are mixing `ASC' and `DESC'.

     `SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 ASC'

   * The key used to fetch the rows are not the same one that is used to
     do the `ORDER BY':

     `SELECT * FROM t1 WHERE key2=constant ORDER BY key1'

    output
     which doesn't use a `const' row fetch method).

   * You have different `ORDER BY' and `GROUP BY' expressions.

   * The used table index is an index type that doesn't store rows in
     order.  (Like the `HASH' index in `HEAP' tables).

In the cases where MySQL have to sort the result, it uses the following
algorithm:

   * Read all rows according to key or by table scanning.  Rows that
     don't match the `WHERE' clause are skipped.

   * Store the sort-key in a buffer (of size `sort_buffer').

    file
     is created)

   * Repeat the above until all rows have been read.

   * Do a multi-merge of up to `MERGEBUFF' (7) regions to one block in
     another temporary file.  Repeat until all blocks from the first
     file are in the second file.

   * Repeat the following until there is less than `MERGEBUFF2' (15)
     blocks left.

   * On the last multi-merge, only the pointer to the row (last part of
     the sort-key) is written to a result file.

    sort these
     and then we read the rows in the sorted order into a row buffer
     (`read_rnd_buffer_size') .

 BY'.
*Note EXPLAIN::.

If you want to have a higher `ORDER BY' speed, you should first see if
you can get MySQL to use indexes instead of having to do an extra
sorting phase. If this is not possible, then you can do:

   * Increase the size of the `sort_buffer_size' variable.

   * Increase the size of the `read_rnd_buffer_size' variable.

   * Change `tmpdir' to point to a dedicated disk with lots of empty
     space.  If you use MySQL 4.1 or later you can spread load between
     several physical disks by setting `tmpdir' to a list of paths
     separated by colon `:' (semicolon `;' on Windows). They will be
     used in round-robin fashion.  *Note:* These paths should end up on
     different *physical* disks, not different partitions of the same
     disk.

 any speed
penalty, though the sorting still occurs.  If a query includes `GROUP
BY' but you want to avoid the overhead of sorting the result, you can
supress sorting by specifying `ORDER BY NULL':

     INSERT INTO foo SELECT a,COUNT(*) FROM bar GROUP BY a ORDER BY NULL;

How MySQL Optimises `LIMIT'
---------------------------

In some cases MySQL will handle the query differently when you are
using `LIMIT row_count' and not using `HAVING':

   * If you are selecting only a few rows with `LIMIT', MySQL will use
     indexes in some cases when it normally would prefer to do a full
     table scan.

   * If you use `LIMIT row_count' with `ORDER BY', MySQL will end the
     sorting as soon as it has found the first `row_count' lines
     instead of sorting the whole table.

   * When combining `LIMIT row_count' with `DISTINCT', MySQL will stop
     as soon as it finds `row_count' unique rows.

   * In some cases a `GROUP BY' can be resolved by reading the key in
     order (or do a sort on the key) and then calculate summaries until
     the key value changes.  In this case `LIMIT row_count' will not
     calculate any unnecessary `GROUP BY's.

   * As soon as MySQL has sent the first `#' rows to the client, it
     will abort the query (if you are not using `SQL_CALC_FOUND_ROWS').

   * `LIMIT 0' will always quickly return an empty set.  This is useful
     to check the query and to get the column types of the result
     columns.

   * When the server uses temporary tables to resolve the query, the
     `LIMIT row_count' is used to calculate how much space is required.

Speed of `INSERT' Queries
-------------------------

The time to insert a record consists approximately of:

   * Connect:                 (3)

   * Sending query to server: (2)

   * Parsing query:           (2)

   * Inserting record:        (1 x size of record)

   * Inserting indexes:       (1 x number of indexes)

   * Close:                   (1)

where the numbers are somewhat proportional to the overall time. This
does not take into consideration the initial overhead to open tables
(which is done once for each concurrently running query).

The size of the table slows down the insertion of indexes by log N
(B-trees).

Some ways to speed up inserts:

   * If you are inserting many rows from the same client at the same
     time, use multiple value lists `INSERT' statements. This is much
     faster (many times in some cases) than using separate `INSERT'
     statements.  If you are adding data to non-empty table, you may
     tune up the `bulk_insert_buffer_size' variable to make it even
     faster.  *Note `bulk_insert_buffer_size': SHOW VARIABLES.

   * If you are inserting a lot of rows from different clients, you can
     get higher speed by using the `INSERT DELAYED' statement. *Note
     `INSERT': INSERT.

   * Note that with `MyISAM' tables you can insert rows at the same time
     `SELECT's are running if there are no deleted rows in the tables.

   * When loading a table from a text file, use `LOAD DATA INFILE'. This
     is usually 20 times faster than using a lot of `INSERT' statements.
     *Note `LOAD DATA': LOAD DATA.

   * It is possible with some extra work to make `LOAD DATA INFILE' run
     even faster when the table has many indexes. Use the following
     procedure:

       1. Optionally create the table with `CREATE TABLE'. For example,
          using `mysql' or Perl-DBI.

       2. Execute a `FLUSH TABLES' statement or the shell command
          `mysqladmin flush-tables'.

       3. Use `myisamchk --keys-used=0 -rq /path/to/db/tbl_name'. This
          will remove all usage of all indexes from the table.

       4. Insert data into the table with `LOAD DATA INFILE'. This will
          not update any indexes and will therefore be very fast.

       5. If you are going to only read the table in the future, run
          `myisampack' on it to make it smaller. *Note Compressed
          format::.

        it avoids lots of disk seeks. The resulting index
          tree is also perfectly balanced.

       7. Execute a `FLUSH TABLES' statement or the shell command
          `mysqladmin flush-tables'.

     Note that `LOAD DATA INFILE' also does the above optimisation if
     you insert into an empty table; the main difference with the above
     procedure is that you can let `myisamchk' allocate much more
     temporary memory for the index creation that you may want MySQL to
     allocate for every index recreation.

     Since MySQL 4.0 you can also use `ALTER TABLE tbl_name DISABLE
     KEYS' instead of `myisamchk --keys-used=0 -rq
     /path/to/db/tbl_name' and `ALTER TABLE tbl_name ENABLE KEYS'
     instead of `myisamchk -r -q /path/to/db/tbl_name'. This way you
     can also skip `FLUSH TABLES' steps.

   * You can speed up insertions that is done over multiple statements
     by locking your tables:

          mysql> LOCK TABLES a WRITE;
          mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33);
          mysql> INSERT INTO a VALUES (8,26),(6,29);
          mysql> UNLOCK TABLES;

     The main speed difference is that the index buffer is flushed to
     disk only once, after all `INSERT' statements have completed.
     Normally there would be as many index buffer flushes as there are
     different `INSERT' statements. Locking is not needed if you can
     insert all rows with a single statement.

     For transactional tables, you should use `BEGIN/COMMIT' instead of
     `LOCK TABLES' to get a speedup.

     Locking will also lower the total time of multi-connection tests,
     but the maximum wait time for some threads will go up (because
     they wait for locks).  For example:

          thread 1 does 1000 inserts
          thread 2, 3, and 4 does 1 insert
          thread 5 does 1000 inserts

          
     in a row.  If you do very many inserts in a row, you could do a
     `LOCK TABLES' followed by an `UNLOCK TABLES' once in a while
     (about each 1000 rows) to allow other threads access to the table.
     This would still result in a nice performance gain.

     Of course, `LOAD DATA INFILE' is much faster for loading data.

To get some more speed for both `LOAD DATA INFILE' and `INSERT',
enlarge the key buffer. *Note Server parameters::.

Speed of `UPDATE' Queries
-------------------------

Update queries are optimised as a `SELECT' query with the additional
overhead of a write. The speed of the write is dependent on the size of
the data that is being updated and the number of indexes that are
updated.  Indexes that are not changed will not be updated.

Also, another way to get fast updates is to delay updates and then do
many updates in a row later. Doing many updates in a row is much quicker
than doing one at a time if you lock the table.

Note that, with dynamic record format, updating a record to a longer
total length may split the record.  So if you do this often, it is very
important to `OPTIMIZE TABLE' sometimes.  *Note `OPTIMIZE TABLE':
OPTIMIZE TABLE.

Speed of `DELETE' Queries
-------------------------

If you want to delete all rows in the table, you should use `TRUNCATE
TABLE table_name'. *Note TRUNCATE::.

The time to delete a record is exactly proportional to the number of
indexes. To delete records more quickly, you can increase the size of
the index cache. *Note Server parameters::.

Other Optimisation Tips
-----------------------

Unsorted tips for faster systems:

   * Use persistent connections to the database to avoid the connection
     overhead. If you can't use persistent connections and you are
     doing a lot of new connections to the database, you may want to
     change the value of the `thread_cache_size' variable. *Note Server
     parameters::.

   * Try to avoid complex `SELECT' queries on `MyISAM' tables that are
     updated a lot. This is to avoid problems with table locking.

   * The new `MyISAM' tables can insert rows in a table without deleted
     rows at the same time another table is reading from it.  If this
     is important for you, you should consider methods where you don't
     have to delete rows or run `OPTIMIZE TABLE' after you have deleted
     a lot of rows.

   * Use `ALTER TABLE ... ORDER BY expr1,expr2...' if you mostly
     retrieve rows in `expr1,expr2...' order.  By using this option
     after big changes to the table, you may be able to get higher
     performance.

   * In some cases it may make sense to introduce a column that is
     'hashed' based on information from other columns. If this column
     is short and reasonably unique it may be much faster than a big
     index on many columns. In MySQL it's very easy to use this extra
     column: `SELECT * FROM table_name WHERE hash=MD5(CONCAT(col1,col2))
     AND col_1='constant' AND col_2='constant''

   * For tables that change a lot you should try to avoid all `VARCHAR'
     or `BLOB' columns. You will get dynamic row length as soon as you
     are using a single `VARCHAR' or `BLOB' column. *Note Table types::.

   * It's not normally useful to split a table into different tables
     just because the rows gets 'big'. To access a row, the biggest
     performance hit is the disk seek to find the first byte of the
     row. After finding the data most new disks can read the whole row
     fast enough for most applications. The only cases where it really
     matters to split up a table is if it's a dynamic row size table
     (see above) that you can change to a fixed row size, or if you
     very often need to scan the table and don't need most of the
     columns. *Note Table types::.

   * If you very often need to calculate things based on information
     from a lot of rows (like counts of things), it's probably much
     better to introduce a new table and update the counter in real
     time. An update of type `UPDATE table SET count=count+1 WHERE
     index_column=constant' is very fast!

     This is really important when you use MySQL table types like
     MyISAM and ISAM that only have table locking (multiple readers /
     single writers). This will also give better performance with most
     databases, as the row locking manager in this case will have less
     to do.

   * If you need to collect statistics from big log tables, use summary
     tables instead of scanning the whole table. Maintaining the
     summaries should be much faster than trying to do statistics
     'live'. It's much faster to regenerate new summary tables from the
     logs when things change (depending on business decisions) than to
     have to change the running application!

   * If possible, one should classify reports as 'live' or
     'statistical', where data needed for statistical reports are only
     generated based on summary tables that are generated from the
     actual data.

   * Take advantage of the fact that columns have default values. Insert
     values explicitly only when the value to be inserted differs from
     the default. This reduces the parsing that MySQL need to do and
     improves the insert speed.

   * In some cases it's convenient to pack and store data into a blob.
     In this case you have to add some extra code in your application
     to pack/unpack things in the blob, but this may save a lot of
     accesses at some stage.  This is practical when you have data that
     doesn't conform to a static table structure.

   * Normally you should try to keep all data non-redundant (what is
     called 3rd normal form in database theory), but you should not be
     afraid of duplicating things or creating summary tables if you
     need these to gain more speed.

   * Stored procedures or UDF (user-defined functions) may be a good
     way to get more performance.  In this case you should, however,
     always have a way to do this some other (slower) way if you use
     some database that doesn't support this.

   * You can always gain something by caching queries/answers in your
     application and trying to do many inserts/updates at the same
     time.  If your database supports lock tables (like MySQL and
     Oracle), this should help to ensure that the index cache is only
     flushed once after all updates.

   * Use `INSERT /*! DELAYED */' when you do not need to know when your
     data is written. This speeds things up because many records can be
     written with a single disk write.

   * Use `INSERT /*! LOW_PRIORITY */' when you want your selects to be
     more important.

   * Use `SELECT /*! HIGH_PRIORITY */' to get selects that jump the
     queue. That is, the select is done even if there is somebody
     waiting to do a write.

   * Use the multi-line `INSERT' statement to store many rows with one
     SQL command (many SQL servers supports this).

   * Use `LOAD DATA INFILE' to load bigger amounts of data. This is
     faster than normal inserts and will be even faster when `myisamchk'
     is integrated in `mysqld'.

   * Use `AUTO_INCREMENT' columns to make unique values.

   * Use `OPTIMIZE TABLE' once in a while to avoid fragmentation when
     using a dynamic table format. *Note `OPTIMIZE TABLE': OPTIMIZE
     TABLE.

   * Use `HEAP' tables to get more speed when possible. *Note Table
     types::.

    better at
     caching files than database contents. So it it's much easier to
     get a fast system if you are using files.

   * Use in memory tables for non-critical data that are accessed often
     (like information about the last shown banner for users that don't
     have cookies).

   * Columns with identical information in different tables should be
     declared identical and have identical names. Before Version 3.23
     you got slow joins otherwise.

     

   * If you need really high speed, you should take a look at the
     low-level interfaces for data storage that the different SQL
     servers support!  For example, by accessing the MySQL `MyISAM'
     directly, you could get a speed increase of 2-5 times compared to
     using the SQL interface.  To be able to do this the data must be
     on the same server as the application, and usually it should only
     be accessed by one process (because external file locking is
     really slow).  One could eliminate the above problems by
     introducing low-level `MyISAM' commands in the MySQL server (this
     could be one easy way to get more performance if needed).  By
     carefully designing the database interface, it should be quite
     easy to support this types of optimisation.

   * In many cases it's faster to access data from a database (using a
     live connection) than accessing a text file, just because the
     database is likely to be more compact than the text file (if you
     are using numerical data), and this will involve fewer disk
     accesses.  You will also save code because you don't have to parse
     your text files to find line and column boundaries.

   * You can also use replication to speed things up. *Note
     Replication::.

   * Declaring a table with `DELAY_KEY_WRITE=1' will make the updating
     of indexes faster, as these are not logged to disk until the file
     is closed.  The downside is that you should run `myisamchk' on
     these tables before you start `mysqld' to ensure that they are
     okay if something killed `mysqld' in the middle.  As the key
     information can always be generated from the data, you should not
     lose anything by using `DELAY_KEY_WRITE'.

[Назад] [Содержание] [Вперед]

Главная