Skip to content
logo
Percona Server for MySQL
Compressed columns with dictionaries
Initializing search
    percona/psmysql-docs
    percona/psmysql-docs
    • Home
      • The Percona XtraDB storage engine
      • List of features available in Percona Server for MySQL releases
      • Percona Server for MySQL feature comparison
      • Changed in Percona Server 8.0
      • Understand version numbers
      • Install Percona Server for MySQL
        • Install using APT repository
        • Files in DEB package
        • Build APT packages
        • Install from downloaded DEB packages
        • Apt pinning
        • Run Percona Server for MySQL
        • Uninstall
        • Install using a RPM repository
        • Files in RPM package
        • Install from downloaded RPM packages
        • Run Percona Server for MySQL
        • Uninstall
        • Install with binary tarballs
        • Binary tarballs available
        • Install Percona Server for MySQL from a source tarball
        • Compile Percona Server for MySQL 8.0 from source
      • Post-installation
      • Percona Server for MySQL in-place upgrading guide: from 5.7 to 8.0
      • Upgrade using the Percona repositories
      • Upgrade from systems that use the MyRocks or TokuDB storage engine and partitioned tables
      • Upgrade using Standalone Packages
      • Downgrade Percona Server for MySQL
      • Running Percona Server for MySQL in a Docker Container
      • Docker environment variables
      • Improved InnoDB I/O scalability
      • Adaptive network buffers
      • Multiple page asynchronous I/O requests
      • Thread pool
      • XtraDB performance improvements for I/O-bound highly-concurrent workloads
      • Prefix index queries optimization
      • Limit the estimation of records in a Query
      • Jemalloc memory allocation profiling
      • The ProcFS plugin
      • Binary logs and replication improvements
      • Compressed columns with dictionaries
        • Specifications
        • Compression dictionary support
          • Interaction with innodb_force_recovery variable
          • Example
        • INFORMATION_SCHEMA Tables
          • INFORMATION_SCHEMA.COMPRESSION_DICTIONARY
          • INFORMATION_SCHEMA.COMPRESSION_DICTIONARY_TABLES
        • Limitations
        • mysqldump command line parameters
        • Version specific information
        • System variables
          • innodb_compressed_columns_zip_level
          • innodb_compressed_columns_threshold
      • Extended SELECT INTO OUTFILE/DUMPFILE
      • Extended SET VAR optimizer hint
      • Improved MEMORY storage engine
      • Suppress warning messages
      • Limiting the disk space used by binary log files
      • Support for PROXY protocol
      • SEQUENCE_TABLE(n) function
      • Slow query log rotation and expiration
      • Trigger updates
      • Extended mysqlbinlog
      • Extended mysqldump
      • InnoDB full-text search improvements
      • Too many connections warning
      • Handle corrupted tables
      • Percona Toolkit UDFs
      • Kill idle transactions
      • XtraDB changed page tracking
      • Enforcing storage engine
      • Expanded fast index creation
      • Backup locks
      • Audit log plugin
      • Start transaction with consistent snapshot
      • Extended SHOW GRANTS
      • Utility user
      • Working with SELinux
      • Working with AppArmor
      • PAM authentication plugin
      • Server variables
      • SSL improvements
      • Data masking
        • Using LDAP authentication plugins
        • LDAP authentication plugin system variables
        • Data at Rest Encryption
        • Use the keyring component or keyring plugin
          • Using the Key Management Interoperability Protocol (KMIP)
          • Use the Amazon Key Management Service (AWS KMS)
          • FIDO authentication plugin
        • Encryption functions
        • Encrypt File-Per-Table Tablespace
        • Encrypt schema or general tablespace
        • Encrypt system tablespace
        • Encrypt temporary files
        • Encrypt Binary Log Files and Relay Log Files
        • Encrypting the Redo Log data
        • Encrypt the undo tablespace
        • Rotate the master key
        • Advanced encryption key rotation
        • Encrypt doublewrite buffers
        • Verify the encryption for tables, tablespaces, and schemas
      • User statistics
      • Slow query log
      • Extended show engine InnoDB status
      • Show storage engines
      • Process list
      • Misc. INFORMATION_SCHEMA tables
      • Thread based profiling
      • InnoDB page fragmentation counters
      • Stacktrace
      • Libcoredumper
      • Manage group replication flow control
      • Group replication system variables
      • Percona MyRocks introduction
      • Percona MyRocks installation guide
      • Updated supported features
      • MyRocks limitations
      • Differences between Percona MyRocks and Facebook MyRocks
      • MyRocks Information Schema tables
      • MyRocks server variables
      • MyRocks status variables
      • Gap locks detection
      • Data loading
      • Installing and configuring Percona Server for MySQL with ZenFS support
      • TokuDB introduction
      • TokuDB installation
      • Use TokuDB
      • Fast updates with TokuDB
      • TokuDB files and file types
      • TokuDB file management
      • TokuDB background ANALYZE TABLE
      • TokuDB variables
      • TokuDB status variables
      • TokuDB fractal tree indexing
      • TokuDB troubleshooting
      • TokuDB Performance Schema integration
      • Frequently asked questions
      • Migrate and removing the TokuDB storage engine
      • Percona TokuBackup
      • Release notes index
      • Percona Server for MySQL 8.0.32-24 (2023-03-20)
      • Percona Server for MySQL 8.0.31-23 (2023-02-09)
      • Percona Server for MySQL 8.0.30-22 Update (2022-11-21)
      • Percona Server for MySQL 8.0.30-22 (2022-11-21)
      • Percona Server for MySQL 8.0.29-21 (2022-08-08)
      • Percona Server for MySQL 8.0.28-20 (2022-06-20)
      • Percona Server for MySQL 8.0.28-19 (2022-05-12)
      • Percona Server for MySQL 8.0.27-18 (2022-03-02)
      • Percona Server for MySQL 8.0.26-17 (2022-01-26)
      • Percona Server for MySQL 8.0.26-16 (2021-10-20)
      • Percona Server for MySQL 8.0.25-15 (2021-07-13)
      • Percona Server for MySQL 8.0.23-14 (2021-05-12)
      • Percona Server for MySQL 8.0.22-13 (2020-12-14)
      • Percona Server for MySQL 8.0.21-12 (2020-10-13)
      • Percona Server for MySQL 8.0.20-11 (2020-07-21)
      • Percona Server for MySQL 8.0.19-10 (2020-03-23)
      • Percona Server for MySQL 8.0.18-9
      • Percona Server for MySQL 8.0.17-8
      • Percona Server for MySQL 8.0.16-7
      • Percona Server for MySQL 8.0.15-6
      • Percona Server for MySQL 8.0.15-5
      • Percona Server for MySQL 8.0.14
      • Percona Server for MySQL 8.0.13-4
      • Percona Server for MySQL 8.0.13-3
      • Percona Server for MySQL 8.0.12-2rc1
      • List of variables introduced in Percona Server for MySQL 8.0
      • Development of Percona Server for MySQL
      • Trademark policy
      • Index of INFORMATION_SCHEMA tables
      • Frequently asked questions
      • Copyright and licensing information
      • Glossary

    • Specifications
    • Compression dictionary support
      • Interaction with innodb_force_recovery variable
      • Example
    • INFORMATION_SCHEMA Tables
      • INFORMATION_SCHEMA.COMPRESSION_DICTIONARY
      • INFORMATION_SCHEMA.COMPRESSION_DICTIONARY_TABLES
    • Limitations
    • mysqldump command line parameters
    • Version specific information
    • System variables
      • innodb_compressed_columns_zip_level
      • innodb_compressed_columns_threshold

    Compressed columns with dictionaries¶

    The per-column compression feature is a data type modifier, independent from user-level SQL and InnoDB data compression, that causes the data stored in the column to be compressed on writing to storage and decompressed on reading. For all other purposes, the data type is identical to the one without the modifier, i.e. no new data types are created. Compression is done by using the zlib library.

    Additionally, it is possible to pre-define a set of strings for each compressed column to achieve a better compression ratio on relatively small individual data items.

    This feature provides:

    • a better compression ratio for text data which consists of a large number of predefined words (e.g. JSON or XML) using compression methods with static dictionaries

    • a way to select columns in the table to compress (in contrast to the InnoDB row compression method). This feature is based on a patch provided by Weixiang Zhai.

    Specifications¶

    The feature is limited to InnoDB/XtraDB storage engine and to columns of the following data types:

    • BLOB (including TINYBLOB, MEDIUMBLOB, LONGBLOG)

    • TEXT (including TINYTEXT, MEDUUMTEXT, LONGTEXT)

    • VARCHAR (including NATIONAL VARCHAR)

    • VARBINARY

    • JSON

    A compressed column is declared by using the syntax that extends the existing COLUMN_FORMAT modifier: COLUMN_FORMAT COMPRESSED. If this modifier is applied to an unsupported column type or storage engine, an error is returned.

    The compression can be specified:

    • when creating a table: CREATE TABLE ... (..., foo BLOB COLUMN_FORMAT COMPRESSED, ...);

    • when altering a table and modifying a column to the compressed format: ALTER TABLE ... MODIFY [COLUMN] ... COLUMN_FORMAT COMPRESSED, or ALTER TABLE ... CHANGE [COLUMN] ... COLUMN_FORMAT COMPRESSED.

    Unlike Oracle MySQL, compression is applicable to generated stored columns. Use this syntax extension as follows:

    mysql> CREATE TABLE t1(
           id INT,
           a BLOB,
           b JSON COLUMN_FORMAT COMPRESSED,
           g BLOB GENERATED ALWAYS AS (a) STORED COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY numbers
         ) ENGINE=InnoDB;
    

    To decompress a column, specify a value other than COMPRESSED to COLUMN_FORMAT: FIXED, DYNAMIC, or DEFAULT. If there is a column compression/decompression request in an ALTER TABLE, it is forced to the COPY algorithm.

    Two new variables: innodb_compressed_columns_zip_level and innodb_compressed_columns_threshold have been implemented.

    Compression dictionary support¶

    To achieve a better compression ratio on relatively small individual data items, it is possible to predefine a compression dictionary, which is a set of strings for each compressed column.

    Compression dictionaries can be represented as a list of words in the form of a string (a comma or any other character can be used as a delimiter although not required). In other words, a, bb, ccc, a bb ccc, and abbccc will have the same effect. However, the latter is more compact. The Quote symbol quoting is handled by regular SQL quoting. The maximum supported dictionary length is 32506 bytes (zlib limitation).

    The compression dictionary is stored in a new system InnoDB table. As this table is of the data dictionary kind, concurrent reads are allowed, but writes are serialized, and reads are blocked by writes. Table read through old read views are not supported, similar to InnoDB internal DDL transactions.

    Interaction with innodb_force_recovery variable¶

    Compression dictionary operations are treated like DDL operations with the exception when innodb_force_value is set to 3: with values less than 3, compression dictionary operations are allowed, and with values >= 3, they are forbidden.

    Note

    Prior to Percona Server for MySQL 8.0.15-6 using Compression dictionary operations with innodb_force_recovery variable set to value > 0 would result in an error.

    Example¶

    In order to use the compression dictionary, you need to create it. This can be done by running:

    mysql> SET @dictionary_data = 'one' 'two' 'three' 'four';
    
    Expected output
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> CREATE COMPRESSION_DICTIONARY numbers (@dictionary_data);
    
    Expected output
    Query OK, 0 rows affected (0.00 sec)
    

    To create a table that has both compression and compressed dictionary support you should run:

    mysql> CREATE TABLE t1(
            id INT,
            a BLOB COLUMN_FORMAT COMPRESSED,
            b BLOB COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY numbers
          ) ENGINE=InnoDB;
    

    The following example shows how to insert a sample of JSON data into the table:

    SET @json_value =
    '[\n'
    ' {\n'
    ' "one" = 0,\n'
    ' "two" = 0,\n'
    ' "three" = 0,\n'
    ' "four" = 0\n'
    ' },\n'
    ' {\n'
    ' "one" = 0,\n'
    ' "two" = 0,\n'
    ' "three" = 0,\n'
    ' "four" = 0\n'
    ' },\n'
    ' {\n'
    ' "one" = 0,\n'
    ' "two" = 0,\n'
    ' "three" = 0,\n'
    ' "four" = 0\n'
    ' },\n'
    ' {\n'
    ' "one" = 0,\n'
    ' "two" = 0,\n'
    ' "three" = 0,\n'
    ' "four" = 0\n'
    ' }\n'
    ']\n'
    ;
    
    mysql> INSERT INTO t1 VALUES(0, @json_value, @json_value);
    Query OK, 1 row affected (0.01 sec)
    

    INFORMATION_SCHEMA Tables¶

    This feature implements two new INFORMATION_SCHEMA tables.

    INFORMATION_SCHEMA.COMPRESSION_DICTIONARY¶

    Column Name Description
    ‘BIGINT(21)_UNSIGNED dict_version’ ‘dictionary version’
    ‘VARCHAR(64) dict_name’ ‘dictionary name’
    ‘BLOB dict_data’ ‘compression dictionary string’

    This table provides a view of the internal compression dictionary. The SUPER privilege is required to query it.

    INFORMATION_SCHEMA.COMPRESSION_DICTIONARY_TABLES¶

    Column Name Description
    ‘BIGINT(21)_UNSIGNED table_schema’ ‘table schema’
    ‘BIGINT(21)_UNSIGNED table_name’ ‘table ID from INFORMATION_SCHEMA.INNODB_SYS_TABLES’
    ‘BIGINT(21)_UNSIGNED column_name’ ‘column position (starts from 0 as in INFORMATION_SCHEMA.INNODB_SYS_COLUMNS)’
    ‘BIGINT(21)_UNSIGNED dict_name’ ‘dictionary ID’

    This table provides a view over the internal table that stores the mapping between the compression dictionaries and the columns using them. The SUPER privilege is require to query it.

    Limitations¶

    Compressed columns cannot be used in indices (neither on their own nor as parts of composite keys).

    Note

    CREATE TABLE t2 AS SELECT \* FROM t1 will create a new table with a compressed column, whereas CREATE TABLE t2 AS SELECT CONCAT(a,'') AS a FROM t1 will not create compressed columns.

    At the same time, after executing the CREATE TABLE t2 LIKE t1 statement, t2.a will have the COMPRESSED attribute.

    ALTER TABLE ... DISCARD/IMPORT TABLESPACE is not supported for tables with compressed columns. To export and import tablespaces with compressed columns, you uncompress them first with: ALTER TABLE ... MODIFY ... COLUMN_FORMAT DEFAULT.

    mysqldump command line parameters¶

    By default, with no additional options, mysqldump will generate a MySQL compatible SQL output.

    All /\*!50633 COLUMN_FORMAT COMPRESSED \*/ and /\*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <dictionary> \*/ won’t be in the dump.

    When a new option enable-compressed-columns is specified, all /\*!50633 COLUMN_FORMAT COMPRESSED \*/ will be left intact and all /\*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <dictionary> \*/ will be transformed into /\*!50633 COLUMN_FORMAT COMPRESSED \*/. In this mode, the dump will contain the necessary SQL statements to create compressed columns, but without dictionaries.

    When a new enable-compressed-columns-with-dictionaries option is specified, dump will contain all compressed column attributes and compression dictionary.

    Moreover, the following dictionary creation fragments will be added before CREATE TABLE statements which are going to use these dictionaries for the first time.

    /*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary>; */
    /*!50633 CREATE COMPRESSION_DICTIONARY <dictionary>(...); */
    

    Two new options add-drop-compression-dictionary and skip-add-drop-compression-dictionary will control if /\*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary> \*/ part from previous paragraph will be skipped or not. By default, add-drop-compression-dictionary the mode will be used.

    When both enable-compressed-columns-with-dictionaries and --tab=<dir> (separate file for each table) options are specified, necessary compression dictionaries will be created in each output file using the following fragment (regardless of the values of add-drop-compression-dictionary and skip-add-drop-compression-dictionary options).

    /*!50633 CREATE COMPRESSION_DICTIONARY IF NOT EXISTS <dictionary>(...); */
    

    Version specific information¶

    • Percona Server for MySQL 8.0.13-3: The feature was ported from Percona Server for MySQL 5.7.

    System variables¶

    innodb_compressed_columns_zip_level¶

    Option Description
    Command-line Yes
    Config file Yes
    Scope Global
    Dynamic Yes
    Data type Numeric
    Default 6
    Range 0-9

    This variable is used to specify the compression level used for compressed columns. Specifying 0 will use no compression, 1 the fastest, and 9 the best compression. The default value is 6.

    innodb_compressed_columns_threshold¶

    Option Description
    Command-line Yes
    Config file Yes
    Scope Global
    Dynamic Yes
    Data type Numeric
    Default 96
    Range 1 - 2^64-1 (or 2^32-1 for 32-bit release)

    By default, a value being inserted will be compressed if its length exceeds innodb_compressed_columns_threshold bytes. Otherwise, it will be stored in the raw (uncompressed) form.

    Please also note that because of the nature of some data, the compressed representation can be longer than the original value. In this case, it does not make sense to store such values in compressed form as Percona Server for MySQL would have to waste both memory space and CPU resources for unnecessary decompression. Therefore, even if the length of such non-compressible values exceeds innodb_compressed_columns_threshold, they will be stored in an uncompressed form (however, an attempt to compress them will still be made).

    This parameter can be tuned to skip unnecessary attempts of data compression for values that are known in advance by the user to have a bad compression ratio of their first N bytes.

    Contact us

    For free technical help, visit the Percona Community Forum.

    To report bugs or submit feature requests, open a JIRA ticket.

    For paid support and managed or consulting services , contact Percona Sales.


    Last update: 2023-01-12
    Percona LLC and/or its affiliates, © 2023
    Made with Material for MkDocs

    Cookie consent

    We use cookies to recognize your repeated visits and preferences, as well as to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better.