Bienvenido! - Willkommen! - Welcome!

Bitácora Técnica de Tux&Cía., Santa Cruz de la Sierra, BO
Bitácora Central: Tux&Cía.
Bitácora de Información Avanzada: Tux&Cía.-Información
May the source be with you!

Thursday, December 2, 2010

MySQL Storage Engines

Source
This article offers a discussion on the plethora of storage engines available for use with late releases of the MySQL database management system (DBMS). Our focus will be on the two most widely-deployed — and talked about — engines, InnoDB and MyISAM.
This article has been written with the developer in mind; particularly those without convenient access to a local MySQL database administrator or out-of-hours guru.
What is a storage engine?
It is not uncommon to hear of developers who have worked with MySQL for some time before realising that the root cause of their poor-performing application is a direct result of their choice of storage engine. This can seem confusing at first, as no conscious choice has often been made. Developers infrequently need to think about such low-level details by design: how your DBMS structures your high-level view of tables on disk is rarely significant to the task at hand, so it is abstracted away.
A MySQL storage engine is best thought of as the end of the line in terms of a query's reach into the DBMS. It is the storage engine's routines that govern the translation to on-disk, machine-friendly data formats (and the same in reverse). A storage engine will impose various functional limits on your application, some more significant than others. (Limitations for the InnoDB and MyISAM storage engines are discussed below.) Heavily-loaded applications that are functionally compatible with more than one storage engine may suffer performance setbacks when thoughtlessly paired with a particular engine. Complex, well-designed applications layered on top of MySQL are often designed to work with more than one storage engine to gain the best of all worlds.
Storage engines in MySQL can be broadly categorised as either transactional or non-transactional. MySQL's default storage engine (as of MySQL 5.0.67) is the MyISAM non-transactional storage engine.
MyISAM
MyISAM is most often described as MySQL's most mature and performant storage engine. While the maturity of the codebase is up for discussion, this engine's supposed performance supremacy should come bundled with a bright disclaimer: it is generally ineffective at handling large, concurrent loads with all but the most trivial of write-to-read query mixes. MyISAM is not a transactional storage engine, nor does it support fine-grained locking. As a result of which, most writes (all UPDATE statements and most INSERT statements) to a MyISAM table will result in an exclusive lock being taken on the entire table. Most importantly, the MyISAM storage engine is not a magic bullet for all application use-cases. As a developer, you should assess its suitability before accepting it (often as a default).

Limitations

  1. Crash recovery can be a time-consuming process owing to MyISAM's lack of a transaction log. Expect to have to perform such a recovery during your peak-usage period. (If it can happen, it almost certainly will.)
  2. MyISAM does not support or enforce foreign key constraints.
  3. All UPDATE queries to the same table are serialised — that is to say they carried out one at a time — and block all other queries, including SELECTs, from executing. This effect is pronounced on most busy multi-user applications.
  4. MyISAM supports concurrent INSERTs only in certain cases.
  5. MyISAM tables created on 32-bit operating systems or using MySQL releases earlier than version 5.0.6 will be limited to 4 GB in size. This limit can be overcome after migrating to a 64-bit host or updating the MySQL server to a later software release.
When to use MyISAM
  1. Your application demands full-text search capabilities. Rather than push all your data into MyISAM tables to gain full-text searching, it may be viable to split your dataset into data that must be indexed for full-text searching — and stored using MyISAM — and data that should be stored using a transactional engine, such as InnoDB. A scheduled background job may then asynchronously update your MyISAM full-text indexes and provide links from the InnoDB data as appropriate. This is a common example of how to gain the best from all worlds.
  2. Your application is effectively single-user — there are very few concurrent queries hitting the MySQL server.
  3. You are performing limited testing or development where performance is not under scrutiny.
InnoDB
InnoDB is MySQL's general purpose transactional storage engine. It is only with this storage engine that MySQL can claim ACID compliance. InnoDB's featureset is largely taken for granted by Microsoft SQL Server and PostgreSQL Server database administrators.
Limitations
  1. Queries that result in large scans of the tablespace are often slower when using InnoDB.
  2. Consistency is only maintained if the underlying operating system and hardware can guarantee buffer flushes. This limitation is inherent in all transactional database management systems.
  3. InnoDB tables consume a greater amount of space on-disk than their MyISAM equivalents. This is now largely irrelevant given the ubiquity of large (multiple hundreds of gigabytes) hard disk drives.

When to use InnoDB

Or, stated with greater effect, when not to use MyISAM:
  1. You are developing an application that requires ACID compliance. At the very least, your application demands the storage layer support the notion of transactions.
  2. You require expedient crash recovery. Almost all production sites fall into this category, however MyISAM table recovery times will obviously vary from one usage pattern to the next. To estimate an accurate figure for your environment, try running myisamchk over a many-gigabyte table from your application's backups on hardware similar to what you have in production. While recovery times of MyISAM tables increase with growth of the table, InnoDB table recovery times remain largely constant throughout the life of the table.
  3. Your web site or application is mostly multi-user. The database is having to deal with frequent UPDATEs to a single table and you would like to make better use of your multi-processing hardware.
Creating one-off InnoDB tables
You are given a chance to specify the storage engine to use for a particular table at creation time. Simply append the ENGINE attribute to the CREATE TABLE statement as follows:
CREATE TABLE innodb_example (
  ...
) ENGINE = InnoDB;
Converting a single table from MyISAM to InnoDB
First, some obligatory scary warnings: Never attempt this change on a production environment first. This change should not be considered trivial. Plan and carry out comprehensive testing on a throwaway dataset beforehand. Referential integrity constraints are stored in MyISAM tables but not enforced. This can lead to a nasty shock upon moving to InnoDB, where suddenly, dormant constraints spring to life and cripple once-functional INSERT and UPDATE queries.
You are not required to perform a complete dump-and-restore when converting from one storage engine to another. The following pair of statements will suffice:
SET SESSION sql_mode='NO_ENGINE_SUBSTITUTION';
ALTER TABLE table_name ENGINE = InnoDB;
As usual with ALTER TABLE statements, reads (SELECTs) from the table will return successfully while the ALTER TABLE statement is executing. Writes (INSERTs and UPDATEs) will be blocked until the ALTER TABLE statement has completed at which point they will be processed as usual.
Now for a practical example.
First, populate a table with some sample data. Here is what the trivial sample table looks like:
CREATE DATABASE egads;

CREATE TABLE batman (
  foo INTEGER AUTO_INCREMENT,
  bar BOOLEAN,
  baz CHAR(255) CHARACTER SET ascii COLLATE ascii_general_ci,
  CONSTRAINT PRIMARY KEY (foo)
) ENGINE = MyISAM;
Note that I have explicitly ordered the use of the MyISAM storage engine for this table.
Cue a hacky Python script to generate a couple gigabytes worth of random data:
import MySQLdb
import random

conn = MySQLdb.connect (host = "localhost",
  user = "root",
  passwd = "moo",
  db = "egads")
cursor = conn.cursor ()
for i in range (1, 10000000):
  cursor.execute ('INSERT INTO batman (bar, baz) VALUES (' +
    str (random.randint(0,1)) + ', ' +
    str (random.randint(0,999999999)) + ');')
  print i
cursor.close ()
conn.close ()
This script will take a few minutes to run. The result (trimmed for brevity):
mysql> show table status like 'batman';
| Name   | Engine | Data_length |
| batman | MyISAM | 2609999739  |
# ls -lhF egads/
total 2.6G
-rw-rw---- 1 mysql mysql 8.5K 2008-11-24 15:57 batman.frm
-rw-rw---- 1 mysql mysql 2.5G 2008-11-24 16:14 batman.MYD
-rw-rw---- 1 mysql mysql  98M 2008-11-24 16:14 batman.MYI
Call it about 2.5 gigabytes of table data.
We are most interested in knowing how long it will take to convert that MyISAM tablespace into data that can be used with InnoDB:
$ time echo 'SET SESSION sql_mode='NO_ENGINE_SUBSTITUTION'; USE egads; ALTER TABLE batman ENGINE = InnoDB;' | mysql;

real    7m34.190s
user    0m0.008s
sys     0m0.004s
mysql> show table status like 'batman';
| Name   | Engine | Data_length |
| batman | InnoDB | 3038773248  |
That roughly equates to three (3) minutes per gigabyte.
For reference, these figures were generated on a GNU/Linux system running kernel release 2.6.27 and MySQL 5.0.67 on an Intel Core 2 Duo E8400 with 2 GB of DDR2 SDRAM and an inexpensive, desktop-grade SATA I/O system. This load was largely I/O-bound.
Using InnoDB without all that CREATE TABLE...ENGINE noise
The default storage engine in MySQL is not fixed (to MyISAM). Supposing your next project would benefit from having a rich mix of InnoDB-to-MyISAM tables (or you require total ACID compliance), you can save yourself some extra finger work by specifying the following additional configuration directive when starting the MySQL daemon:
--default-storage-engine=InnoDB
Any CREATE TABLE statements given without an ENGINE= option (or with a non-existent ENGINE) will now default to InnoDB.
Applies to
MySQL 5 (5.0.67 and above).

No comments: