Using TokuDB¶
Warning
Do not move or modify any TokuDB files. You will break the database, and need to recover the database from a backup.
Fast Insertions and Richer Indexes¶
TokuDB’s fast indexing enables fast queries through the use of rich indexes, such as covering and clustering indexes. It’s worth investing some time to optimize index definitions to get the best performance from MySQL and TokuDB. Here are some resources to get you started:
-
“Understanding Indexing” by Zardosht Kasheff (video)
-
How Clustering Indexes Sometimes Helps UPDATE and DELETE Performance
-
High Performance MySQL, 3rd Edition by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Copyright 2012, O’Reilly Media. See Chapter 5, Indexing for High Performance.
Clustering Secondary Indexes¶
One of the keys to exploiting TokuDB’s strength in indexing is to make use of clustering secondary indexes.
TokuDB allows a secondary key to be defined as a clustering key. This means that all of the columns in the table are clustered with the secondary key. Percona Server for MySQL parser and query optimizer support Multiple Clustering Keys when TokuDB engine is used. This means that the query optimizer will avoid primary clustered index reads and replace them by secondary clustered index reads in certain scenarios.
The parser has been extended to support following syntax:
CREATE TABLE ... ( ..., CLUSTERING KEY identifier (column list), ...
CREATE TABLE ... ( ..., UNIQUE CLUSTERING KEY identifier (column list), ...
CREATE TABLE ... ( ..., CLUSTERING UNIQUE KEY identifier (column list), ...
CREATE TABLE ... ( ..., CONSTRAINT identifier UNIQUE CLUSTERING KEY identifier (column list), ...
CREATE TABLE ... ( ..., CONSTRAINT identifier CLUSTERING UNIQUE KEY identifier (column list), ...
CREATE TABLE ... (... column type CLUSTERING [UNIQUE] [KEY], ...)
CREATE TABLE ... (... column type [UNIQUE] CLUSTERING [KEY], ...)
ALTER TABLE ..., ADD CLUSTERING INDEX identifier (column list), ...
ALTER TABLE ..., ADD UNIQUE CLUSTERING INDEX identifier (column list), ...
ALTER TABLE ..., ADD CLUSTERING UNIQUE INDEX identifier (column list), ...
ALTER TABLE ..., ADD CONSTRAINT identifier UNIQUE CLUSTERING INDEX identifier (column list), ...
ALTER TABLE ..., ADD CONSTRAINT identifier CLUSTERING UNIQUE INDEX identifier (column list), ...
CREATE CLUSTERING INDEX identifier ON ...
To define a secondary index as clustering, simply add the word CLUSTERING
before the key definition. For example:
CREATE TABLE foo (
column_a INT,
column_b INT,
column_c INT,
PRIMARY KEY index_a (column_a),
CLUSTERING KEY index_b (column_b)) ENGINE = TokuDB;
In the previous example, the primary table is indexed on column_a. Additionally, there is a secondary clustering index (named index_b) sorted on column_b. Unlike non-clustered indexes, clustering indexes include all the columns of a table and can be used as covering indexes. For example, the following query will run very fast using the clustering index_b:
SELECT column_c
FROM foo
WHERE column_b BETWEEN 10 AND 100;
This index is sorted on column_b, making the WHERE
clause fast, and includes column_c, which avoids lookups in the primary table to satisfy the query.
TokuDB makes clustering indexes feasible because of its excellent compression and very high indexing rates. For more information about using clustering indexes, see Introducing Multiple Clustering Indexes.
Hot Index Creation¶
TokuDB enables you to add indexes to an existing table and still perform inserts and queries on that table while the index is being created.
The ONLINE
keyword is not used. Instead, the value of the tokudb_create_index_online client session variable is examined.
Hot index creation is invoked using the CREATE INDEX
command after setting tokudb_create_index_online to on
as follows:
mysql> SET tokudb_create_index_online=on;
The output should resemble the following:
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE INDEX index ON foo (field_name);
Alternatively, using the ALTER TABLE
command for creating an index will create the index offline (with the table unavailable for inserts or queries), regardless of the value of tokudb_create_index_online. The only way to hot create an index is to use the CREATE INDEX
command.
Hot creating an index will be slower than creating the index offline, and progress depends how busy the mysqld server is with other tasks. Progress of the index creation can be seen by using the SHOW PROCESSLIST
command (in another client). Once the index creation completes, the new index will be used in future query plans.
If more than one hot CREATE INDEX
is issued for a particular table, the indexes will be created serially. An index creation that is waiting for another to complete will be shown as Locked in SHOW PROCESSLIST
. We recommend that each CREATE INDEX
be allowed to complete before the next one is started.
Hot Column Add, Delete, Expand, and Rename (HCADER)¶
TokuDB enables you to add or delete columns in an existing table, expand char
, varchar
, varbinary
, and integer
type columns in an existing table, or rename an existing column in a table with little blocking of other updates and queries. HCADER typically blocks other queries with a table lock for no more than a few seconds. After that initial short-term table locking, the system modifies each row (when adding, deleting, or expanding columns) later, when the row is next brought into main memory from disk. For column rename, all the work is done during the seconds of downtime. On-disk rows need not be modified.
To get good performance from HCADER, observe the following guidelines:
- The work of altering the table for column addition, deletion, or expansion is performed as subsequent operations touch parts of the Fractal Tree, both in the primary index and secondary indexes.
You can force the column addition, deletion, or expansion work to be performed all at once using the standard syntax of OPTIMIZE TABLE X
, when a column has been added to, deleted from, or expanded in table X. It is important to note that as of TokuDB version 7.1.0, OPTIMIZE TABLE
is also hot, so that a table supports updates and queries without blocking while an OPTIMIZE TABLE
is being performed. Also, a hot OPTIMIZE TABLE
does not rebuild the indexes, since TokuDB indexes do not age. Rather, they flush all background work, such as that induced by a hot column addition, deletion, or expansion.
-
Each hot column addition, deletion, or expansion operation must be performed individually (with its own SQL statement). If you want to add, delete, or expand multiple columns use multiple statements.
-
Avoid adding, deleting, or expanding a column at the same time as adding or dropping an index.
-
The time that the table lock is held can vary. The table-locking time for HCADER is dominated by the time it takes to flush dirty pages, because MySQL closes the table after altering it. If a checkpoint has happened recently, this operation is fast (on the order of seconds). However, if the table has many dirty pages, then the flushing stage can take on the order of minutes.
-
Avoid dropping a column that is part of an index. If a column to be dropped is part of an index, then dropping that column is slow. To drop a column that is part of an index, first drop the indexes that reference the column in one alter table statement, and then drop the column in another statement.
-
Hot column expansion operations are only supported to
char
,varchar
,varbinary
, andinteger
data types. Hot column expansion is not supported if the given column is part of the primary key or any secondary keys. -
Rename only one column per statement. Renaming more than one column will revert to the standard MySQL blocking behavior. The proper syntax is as follows:
ALTER TABLE table
CHANGE column_old column_new
DATA_TYPE REQUIRED_NESS DEFAULT
Here’s an example of how that might look:
ALTER TABLE table
CHANGE column_old column_new
INT(10) NOT NULL;
Notice that all of the column attributes must be specified. ALTER TABLE table CHANGE column_old column_new;
induces a slow, blocking column rename.
-
Hot column rename does not support the following data types:
TIME
,ENUM
,BLOB
,TINYBLOB
,MEDIUMBLOB
,LONGBLOB
. Renaming columns of these types will revert to the standard MySQL blocking behavior. -
Temporary tables cannot take advantage of HCADER. Temporary tables are typically small anyway, so altering them using the standard method is usually fast.
Compression Details¶
TokuDB offers different levels of compression, which trade off between the amount of CPU used and the compression achieved. Standard compression uses less CPU but generally compresses at a lower level, high compression uses more CPU and generally compresses at a higher level. We have seen compression up to 25x on customer data.
Compression in TokuDB occurs on background threads, which means that high compression need not slow down your database. Indeed, in some settings, we’ve seen higher overall database performance with high compression.
Note
We recommend that users use standard compression on machines with six or fewer cores, and high compression on machines with more than six cores.
The ultimate choice depends on the particulars of how a database is used, and we recommend that users use the default settings unless they have profiled their system with high compression in place.
Compression is set on a per-table basis and is controlled by setting row format during a CREATE TABLE
or ALTER TABLE
. For example:
CREATE TABLE table (
column_a INT NOT NULL PRIMARY KEY,
column_b INT NOT NULL) ENGINE=TokuDB
ROW_FORMAT=row_format;
If no row format is specified in a CREATE TABLE
, the table is compressed using whichever row format is specified in the session variable tokudb_row_format. If no row format is set nor is tokudb_row_format, the zlib compressor is used.
row_format and tokudb_row_format variables accept the following values:
-
TOKUDB_DEFAULT
: This sets the compression to the default behavior. As of TokuDB 7.1.0, the default behavior is to compress using the zlib library. In the future this behavior may change. -
TOKUDB_FAST
: This sets the compression to use the quicklz library. -
TOKUDB_SMALL
: This sets the compression to use the lzma library.
In addition, you can choose a compression library directly, which will override previous values. The following libraries are available:
-
TOKUDB_ZLIB
: Compress using the zlib library, which provides mid-range compression and CPU utilization. -
TOKUDB_QUICKLZ
: Compress using the quicklz library, which provides light compression and low CPU utilization. -
TOKUDB_LZMA
: Compress using the lzma library, which provides the highest compression and high CPU utilization. -
TOKUDB_SNAPPY
- This compression is using snappy library and aims for very high speeds and reasonable compression. -
TOKUDB_UNCOMPRESSED
: This setting turns off compression and is useful for tables with data that cannot be compressed.
Changing Compression of a Table¶
Modify the compression used on a particular table with the following command:
ALTER TABLE table
ROW_FORMAT=row_format;
Note
Changing the compression of a table only affects newly written data (dirtied blocks). After changing a table’s compression you can run OPTIMZE TABLE
to rewrite all blocks of the table and its indexes.
Read Free Replication¶
TokuDB replicas can be configured to perform significantly less read IO in order to apply changes from the source. By utilizing the power of Fractal Tree indexes:
-
insert/update/delete operations can be configured to eliminate read-modify-write behavior and simply inject messages into the appropriate Fractal Tree indexes
-
update/delete operations can be configured to eliminate the IO required for uniqueness checking
To enable Read Free Replication, the servers must be configured as follows:
-
On the replication source:
- Enable row based replication: set
BINLOG_FORMAT=ROW
- Enable row based replication: set
-
On the replication replica(s):
-
The replica must be in read-only mode: set
read_only=1
-
Disable unique checks: set
tokudb_rpl_unique_checks=0
-
Disable lookups (read-modify-write): set
tokudb_rpl_lookup_rows=0
-
Note
You can modify one or both behaviors on the replica(s).
Note
As long as the source is using row based replication, this optimization is available on a TokuDB replica. This means that it’s available even if the source is using InnoDB or MyISAM tables, or running non-TokuDB binaries.
Warning
TokuDB Read Free Replication will not propagate UPDATE
and DELETE
events reliably if TokuDB table is missing the primary key which will eventually lead to data inconsistency on the replica.
Transactions and ACID-compliant Recovery¶
By default, TokuDB checkpoints all open tables regularly and logs all changes between checkpoints, so that after a power failure or system crash, TokuDB will restore all tables into their fully ACID-compliant state. That is, all committed transactions will be reflected in the tables, and any transaction not committed at the time of failure will be rolled back.
The default checkpoint period is every 60 seconds, and this specifies the time from the beginning of one checkpoint to the beginning of the next. If a checkpoint requires more than the defined checkpoint period to complete, the next checkpoint begins immediately. It is also related to the frequency with which log files are trimmed, as described below. The user can induce a checkpoint at any time by issuing the FLUSH LOGS
command. When a database is shut down normally it is also checkpointed and all open transactions are aborted. The logs are trimmed at startup.
Managing Log Size¶
TokuDB keeps log files back to the most recent checkpoint. Whenever a log file reaches 100 MB, a new log file is started. Whenever there is a checkpoint, all log files older than the checkpoint are discarded. If the checkpoint period is set to be a very large number, logs will get trimmed less frequently. This value is set to 60 seconds by default.
TokuDB also keeps rollback logs for each open transaction. The size of each log is proportional to the amount of work done by its transaction and is stored compressed on disk. Rollback logs are trimmed when the associated transaction completes.
Recovery¶
Recovery is fully automatic with TokuDB. TokuDB uses both the log files and rollback logs to recover from a crash. The time to recover from a crash is proportional to the combined size of the log files and uncompressed size of rollback logs. Thus, if there were no long-standing transactions open at the time of the most recent checkpoint, recovery will take less than a minute.
Disabling the Write Cache¶
When using any transaction-safe database, it is essential that you understand the write-caching characteristics of your hardware. TokuDB provides transaction safe (ACID compliant) data storage for MySQL. However, if the underlying operating system or hardware does not actually write data to disk when it says it did, the system can corrupt your database when the machine crashes. For example, TokuDB can not guarantee proper recovery if it is mounted on an NFS volume. It is always safe to disable the write cache, but you may be giving up some performance.
For most configurations you must disable the write cache on your disk drives. On ATA/SATA drives, the following command should disable the write cache:
$ hdparm -W0 /dev/hda
There are some cases when you can keep the write cache, for example:
-
Write caching can remain enabled when using XFS, but only if XFS reports that disk write barriers work. If you see one of the following messages in /var/log/messages, then you must disable the write cache:
-
Disabling barriers, not supported with external log device
-
Disabling barriers, not supported by the underlying device
-
Disabling barriers, trial barrier write failed
-
XFS write barriers appear to succeed for single disks (with no LVM), or for very recent kernels (such as that provided by Fedora 12). For more information, see the XFS FAQ.
In the following cases, you must disable the write cache:
-
If you use the ext3 filesystem
-
If you use LVM (although recent Linux kernels, such as Fedora 12, have fixed this problem)
-
If you use Linux’s software RAID
-
If you use a RAID controller with battery-backed-up memory. This may seem counter-intuitive. For more information, see the XFS FAQ
In summary, you should disable the write cache, unless you have a very specific reason not to do so.
Progress Tracking¶
TokuDB has a system for tracking progress of long running statements, thereby removing the need to define triggers to track statement execution, as follows:
-
Bulk Load: When loading large tables using
LOAD DATA INFILE
commands, doing aSHOW PROCESSLIST
command in a separate client session shows progress. There are two progress stages. The first will state something likeInserted about 1000000 rows
. After all rows are processed like this, the next stage tracks progress by showing what fraction of the work is done (e.g.Loading of data about 45% done
) -
Adding Indexes: When adding indexes via
ALTER TABLE
orCREATE INDEX
, the commandSHOW PROCESSLIST
shows progress. When adding indexes viaALTER TABLE
orCREATE INDEX
, the commandSHOW PROCESSLIST
will include an estimation of the number of rows processed. Use this information to verify progress is being made. Similar to bulk loading, the first stage shows how many rows have been processed, and the second stage shows progress with a fraction. -
Commits and Aborts: When committing or aborting a transaction, the command
SHOW PROCESSLIST
will include an estimate of the transactional operations processed.
Migrating to TokuDB¶
To convert an existing table to use the TokuDB engine, run ALTER TABLE... ENGINE=TokuDB
. If you wish to load from a file, use LOAD DATA INFILE
and not mysqldump
. Using mysqldump
will be much slower. To create a file that can be loaded with LOAD DATA INFILE
, refer to the INTO OUTFILE
option of the SELECT Syntax.
Note
Creating this file does not save the schema of your table, so you may want to create a copy of that as well.