C++ CSS HTML Java JavaScript MySQL Oracle PERL PHP SQL Unix VBScript XHTML XML Сети
MySQL Optimisation (MySQL 4.0)
 
MySQL Optimisation
******************


the more optimal you want your system to become the more you will have
to know about it.

This chapter will try to explain and give some examples of different
ways to optimise MySQL.  Remember, however, that there are always some
(increasingly harder) additional ways to make the system even faster.

Optimisation Overview
=====================

The most important part for getting a system fast is of course the basic
design. You also need to know what kinds of things your system will be
doing, and what your bottlenecks are.

The most common bottlenecks are:
   * Disk seeks.  It takes time for the disk to find a piece of data.
     With modern disks in 1999, the mean time for this is usually lower
     than 10ms, so we can in theory do about 100 seeks a second. This
     time improves slowly with new disks and is very hard to optimise
     for a single table. The way to optimise this is to spread the data
     on more than one disk.

   * Disk reading/writing.  When the disk is at the correct position we
     need to read the data. With modern disks in 1999, one disk
     delivers something like 10-20 MB. This is easier to optimise than
     seeks because you can read in parallel from multiple disks.

   * CPU cycles.  When we have the data in main memory (or if it
     already were there) we need to process it to get to our result.
     Having small tables compared to the memory is the most common
     limiting factor. But then, with small tables speed is usually not
     the problem.

   * Memory bandwidth.  When the CPU needs more data than can fit in
     the CPU cache the main memory bandwidth becomes a bottleneck. This
     is an uncommon bottleneck for most systems, but one should be
     aware of it.

MySQL Design Limitations/Tradeoffs
----------------------------------

When using the MyISAM storage engine, MySQL uses extremely fast table
locking (multiple readers / single writers). The biggest problem with
this table type is a if you have a mix of a steady stream of updates and
slow selects on the same table.  If this is a problem with some tables,
you can use another table type for these. *Note Table types::.

MySQL can work with both transactional and non-transactional tables.
To be able to work smoothly with non-transactional tables (which can't
rollback if something goes wrong), MySQL has the following rules:

   * All columns have default values.

   * If you insert a 'wrong' value in a column like a `NULL' in a `NOT
     NULL' column or a too big numerical value in a numerical column,
     MySQL will instead of giving an error instead set the column to
     the 'best possible value'.  For numerical values this is 0, the
     smallest possible values or the largest possible value. For
     strings this is either the empty string or the longest possible
     string that can be in the column.

   * All calculated expressions returns a value that can be used
     instead of signaling an error condition. For example 1/0 returns
     `NULL'

For more information about this, see *Note Constraints::.

The above means that one should not use MySQL to check fields content,
but one should do this in the application.

Portability
-----------

 If you want an
application that is fast with many databases it becomes even harder!

To make a complex application portable you need to choose a number of
SQL servers that it should work with.

 tests far from everything possible, but it is still
comprehensive with about 450 things tested.

For example, you shouldn't have column names longer than 18 characters
if you want to be able to use Informix or DB2.

Both the MySQL benchmarks and `crash-me' programs are very
database-independent.  By taking a look at how we have handled this, you
can get a feeling for what you have to do to write your application
database-independent.  The benchmarks themselves can be found in the
`sql-bench' directory in the MySQL source distribution. They are
written in Perl with DBI database interface (which solves the access
part of the problem).

See `http://www.mysql.com/information/benchmarks.html' for the results
from this benchmark.

As you can see in these results, all databases have some weak points.
That is, they have different design compromises that lead to different
behaviour.

 readers/writers
on the same table. Oracle, on the other hand, has a big problem when
you try to access rows that you have recently updated (until they are
flushed to disk). Transaction databases in general are not very good at
generating summary tables from log tables, as in this case row locking
is almost useless.

To get your application _really_ database-independent, you need to
define an easy extendable interface through which you manipulate your
data. As C++ is available on most systems, it makes sense to use a C++
classes interface to the databases.

If you use some specific feature for some database (like the `REPLACE'
command in MySQL), you should code a method for the other SQL servers
to implement the same feature (but slower).  With MySQL you can use the
`/*!  */' syntax to add MySQL-specific keywords to a query.  The code
inside `/**/' will be treated as a comment (ignored) by most other SQL
servers.

 old
results 'expire' after a while, you can keep the cache reasonably
fresh.  This provides a method to handle high load spikes, in which case
you can dynamically increase the cache and set the expire timeout higher
until things get back to normal.

In this case the table creation information should contain information
of the initial size of the cache and how often the table should normally
be refreshed.

What Have We Used MySQL For?
----------------------------

During MySQL initial development, the features of MySQL were made to
fit our largest customer. They handle data warehousing for a couple of
the biggest retailers in Sweden.

 affecting their
customers.

The data is quite huge (about 7 million summary transactions per month),
and we have data for 4-10 years that we need to present to the users.
We got weekly requests from the customers that they want to get
'instant' access to new reports from this data.

We solved this by storing all information per month in compressed
'transaction' tables. We have a set of simple macros (script) that
generates summary tables grouped by different criteria (product group,
customer id, store ...) from the transactional tables.  The reports are
web pages that are dynamically generated by a small Perl script that
parses a web page, executes the SQL statements in it, and inserts the
results. We would have used PHP or mod_perl instead but they were not
available at that time.

For graphical data we wrote a simple tool in `C' that can produce GIFs
based on the result of an SQL query (with some processing of the
result). This is also dynamically executed from the Perl script that
parses the `HTML' files.

In most cases a new report can simply be done by copying an existing
script and modifying the SQL query in it.  In some cases, we will need
to add more fields to an existing summary table or generate a new one,
but this is also quite simple, as we keep all transactions tables on
disk.  (Currently we have at least 50G of transactions tables and 200G
of other customer data.)

We also let our customers access the summary tables directly with ODBC
so that the advanced users can themselves experiment with the data.

We haven't had any problems handling this with quite modest Sun Ultra
SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2
CPU 400 Mhz UltraSPARC, and we are now planning to start handling
transactions on the product level, which would mean a ten-fold increase
of data. We think we can keep up with this by just adding more disk to
our systems.

We are also experimenting with Intel-Linux to be able to get more CPU
power cheaper. Now that we have the binary portable database format (new
in Version 3.23), we will start to use this for some parts of the
application.

Our initial feelings are that Linux will perform much better on
low-to-medium load and Solaris will perform better when you start to
get a high load because of extreme disk IO, but we don't yet have
anything conclusive about this. After some discussion with a Linux
Kernel developer, this might be a side effect of Linux giving so much
resources to the batch job that the interactive performance gets very
low. This makes the machine feel very slow and unresponsive while big
batches are going. Hopefully this will be better handled in future
Linux Kernels.

The MySQL Benchmark Suite
-------------------------

This should contain a technical description of the MySQL benchmark
suite (and `crash-me'), but that description is not written yet.
Currently, you can get a good idea of the benchmark by looking at the
code and results in the `sql-bench' directory in any MySQL source
distributions.

This benchmark suite is meant to be a benchmark that will tell any user
what things a given SQL implementation performs well or poorly at.

Note that this benchmark is single threaded, so it measures the minimum
time for the operations. We plan to in the future add a lot of
multi-threaded tests to the benchmark suite.

For example, (run on the same NT 4.0 machine):

*Reading 2000000 rows by  *Seconds**Seconds*
index*                            
mysql                     367     249
mysql_odbc                464     
db2_odbc                  1206    
informix_odbc             121126  
ms-sql_odbc               1634    
oracle_odbc               20800   
solid_odbc                877     
sybase_odbc               17614   

*Inserting (350768)       *Seconds**Seconds*
rows*                             
mysql                     381     206
mysql_odbc                619     
db2_odbc                  3460    
informix_odbc             2692    
ms-sql_odbc               4012    
oracle_odbc               11291   
solid_odbc                1801    
sybase_odbc               4802    

In the above test MySQL was run with a 8M index cache.

We have gathered some more benchmark results at
`http://www.mysql.com/information/benchmarks.html'.

Note that Oracle is not included because they asked to be removed. All
Oracle benchmarks have to be passed by Oracle! We believe that makes
Oracle benchmarks *very* biased because the above benchmarks are
supposed to show what a standard installation can do for a single
client.

To run the benchmark suite, you have to download a MySQL source
distribution, install the Perl DBI driver, the Perl DBD driver for the
database you want to test and then do:

     cd sql-bench
     perl run-all-tests --server=#

where # is one of supported servers. You can get a list of all options
and supported servers by doing `run-all-tests --help'.

`crash-me' tries to determine what features a database supports and
what its capabilities and limitations are by actually running queries.
For example, it determines:

   * What column types are supported

   * How many indexes are supported

   * What functions are supported

   * How big a query can be

   * How big a `VARCHAR' column can be

We can find the result from `crash-me' on a lot of different databases
at `http://www.mysql.com/information/crash-me.php'.

Using Your Own Benchmarks
-------------------------

 the next
bottleneck (and so on).  Even if the overall performance for your
application is sufficient, you should at least make a plan for each
bottleneck, and decide how to solve it if someday you really need the
extra performance.


this, you can try different solutions to your problem and test which is
really the fastest solution for you.

Another free benchmark suite is the `Open Source Database Benchmark'
which may be found at `http://osdb.sourceforge.net/'.

It is very common that some problems only occur when the system is very
heavily loaded. We have had many customers who contact us when they
have a (tested) system in production and have encountered load
problems. In every one of these cases so far, it has been problems with
basic design (table scans are *not good* at high load) or OS/Library
issues. Most of this would be a *lot* easier to fix if the systems were
not already in production.

To avoid problems like this, you should put some effort into
benchmarking your whole application under the worst possible load!  You
can use Super Smack for this, and it is available at:
`http://www.mysql.com/Downloads/super-smack/super-smack-1.0.tar.gz'.
As the name suggests, it can bring your system down to its knees if you
ask it, so make sure to use it only on your development systems.



[Назад] [Содержание] [Вперед]

Главная