Skip to content
logo
Percona Server for MySQL
Data loading
Initializing search
    percona/psmysql-docs
    percona/psmysql-docs
    • Home
      • The Percona XtraDB storage engine
      • List of features available in Percona Server for MySQL releases
      • Percona Server for MySQL feature comparison
      • Changed in Percona Server 8.0
      • Understand version numbers
      • Install Percona Server for MySQL
        • Install using APT repository
        • Files in DEB package
        • Build APT packages
        • Install from downloaded DEB packages
        • Apt pinning
        • Run Percona Server for MySQL
        • Uninstall
        • Install using a RPM repository
        • Files in RPM package
        • Install from downloaded RPM packages
        • Run Percona Server for MySQL
        • Uninstall
        • Install with binary tarballs
        • Binary tarballs available
        • Install Percona Server for MySQL from a source tarball
        • Compile Percona Server for MySQL 8.0 from source
      • Post-installation
      • Percona Server for MySQL in-place upgrading guide: from 5.7 to 8.0
      • Upgrade using the Percona repositories
      • Upgrade from systems that use the MyRocks or TokuDB storage engine and partitioned tables
      • Upgrade using Standalone Packages
      • Downgrade Percona Server for MySQL
      • Running Percona Server for MySQL in a Docker Container
      • Docker environment variables
      • Improved InnoDB I/O scalability
      • Adaptive network buffers
      • Multiple page asynchronous I/O requests
      • Thread pool
      • XtraDB performance improvements for I/O-bound highly-concurrent workloads
      • Prefix index queries optimization
      • Limit the estimation of records in a Query
      • Jemalloc memory allocation profiling
      • The ProcFS plugin
      • Binary logs and replication improvements
      • Compressed columns with dictionaries
      • Extended SELECT INTO OUTFILE/DUMPFILE
      • Extended SET VAR optimizer hint
      • Improved MEMORY storage engine
      • Suppress warning messages
      • Limiting the disk space used by binary log files
      • Support for PROXY protocol
      • SEQUENCE_TABLE(n) function
      • Slow query log rotation and expiration
      • Trigger updates
      • Extended mysqlbinlog
      • Extended mysqldump
      • InnoDB full-text search improvements
      • Too many connections warning
      • Handle corrupted tables
      • Percona Toolkit UDFs
      • Kill idle transactions
      • XtraDB changed page tracking
      • Enforcing storage engine
      • Expanded fast index creation
      • Backup locks
      • Audit log plugin
      • Start transaction with consistent snapshot
      • Extended SHOW GRANTS
      • Utility user
      • Working with SELinux
      • Working with AppArmor
      • PAM authentication plugin
      • Server variables
      • SSL improvements
      • Data masking
        • Using LDAP authentication plugins
        • LDAP authentication plugin system variables
        • Data at Rest Encryption
        • Use the keyring component or keyring plugin
          • Using the Key Management Interoperability Protocol (KMIP)
          • Use the Amazon Key Management Service (AWS KMS)
          • FIDO authentication plugin
        • Encryption functions
        • Encrypt File-Per-Table Tablespace
        • Encrypt schema or general tablespace
        • Encrypt system tablespace
        • Encrypt temporary files
        • Encrypt Binary Log Files and Relay Log Files
        • Encrypting the Redo Log data
        • Encrypt the undo tablespace
        • Rotate the master key
        • Advanced encryption key rotation
        • Encrypt doublewrite buffers
        • Verify the encryption for tables, tablespaces, and schemas
      • User statistics
      • Slow query log
      • Extended show engine InnoDB status
      • Show storage engines
      • Process list
      • Misc. INFORMATION_SCHEMA tables
      • Thread based profiling
      • InnoDB page fragmentation counters
      • Stacktrace
      • Libcoredumper
      • Manage group replication flow control
      • Group replication system variables
      • Percona MyRocks introduction
      • Percona MyRocks installation guide
      • Updated supported features
      • MyRocks limitations
      • Differences between Percona MyRocks and Facebook MyRocks
      • MyRocks Information Schema tables
      • MyRocks server variables
      • MyRocks status variables
      • Gap locks detection
      • Data loading
        • Sorted bulk loading
          • Creating secondary indexes
          • Loading data
        • Unsorted bulk loading
        • Other approaches
        • Other reading
      • Installing and configuring Percona Server for MySQL with ZenFS support
      • TokuDB introduction
      • TokuDB installation
      • Use TokuDB
      • Fast updates with TokuDB
      • TokuDB files and file types
      • TokuDB file management
      • TokuDB background ANALYZE TABLE
      • TokuDB variables
      • TokuDB status variables
      • TokuDB fractal tree indexing
      • TokuDB troubleshooting
      • TokuDB Performance Schema integration
      • Frequently asked questions
      • Migrate and removing the TokuDB storage engine
      • Percona TokuBackup
      • Release notes index
      • Percona Server for MySQL 8.0.32-24 (2023-03-20)
      • Percona Server for MySQL 8.0.31-23 (2023-02-09)
      • Percona Server for MySQL 8.0.30-22 Update (2022-11-21)
      • Percona Server for MySQL 8.0.30-22 (2022-11-21)
      • Percona Server for MySQL 8.0.29-21 (2022-08-08)
      • Percona Server for MySQL 8.0.28-20 (2022-06-20)
      • Percona Server for MySQL 8.0.28-19 (2022-05-12)
      • Percona Server for MySQL 8.0.27-18 (2022-03-02)
      • Percona Server for MySQL 8.0.26-17 (2022-01-26)
      • Percona Server for MySQL 8.0.26-16 (2021-10-20)
      • Percona Server for MySQL 8.0.25-15 (2021-07-13)
      • Percona Server for MySQL 8.0.23-14 (2021-05-12)
      • Percona Server for MySQL 8.0.22-13 (2020-12-14)
      • Percona Server for MySQL 8.0.21-12 (2020-10-13)
      • Percona Server for MySQL 8.0.20-11 (2020-07-21)
      • Percona Server for MySQL 8.0.19-10 (2020-03-23)
      • Percona Server for MySQL 8.0.18-9
      • Percona Server for MySQL 8.0.17-8
      • Percona Server for MySQL 8.0.16-7
      • Percona Server for MySQL 8.0.15-6
      • Percona Server for MySQL 8.0.15-5
      • Percona Server for MySQL 8.0.14
      • Percona Server for MySQL 8.0.13-4
      • Percona Server for MySQL 8.0.13-3
      • Percona Server for MySQL 8.0.12-2rc1
      • List of variables introduced in Percona Server for MySQL 8.0
      • Development of Percona Server for MySQL
      • Trademark policy
      • Index of INFORMATION_SCHEMA tables
      • Frequently asked questions
      • Copyright and licensing information
      • Glossary

    • Sorted bulk loading
      • Creating secondary indexes
      • Loading data
    • Unsorted bulk loading
    • Other approaches
    • Other reading

    Data loading¶

    By default, MyRocks configurations are optimized for short transactions, and not for data loading. MyRocks has a couple of special session variables to speed up data loading dramatically.

    Sorted bulk loading¶

    If your data is guaranteed to be loaded in primary key order, then this method is recommended. This method works by dropping any secondary keys first, loading data into your table in primary key order, and then restoring the secondary keys via Fast Secondary Index Creation.

    Creating secondary indexes¶

    When loading data into empty tables, it is highly recommended to drop all secondary indexes first, then loading data, and adding all secondary indexes after finishing loading data. MyRocks has a feature called Fast Secondary Index Creation. Fast Secondary Index Creation is automatically used when executing CREATE INDEX or ALTER TABLE ... ADD INDEX. With Fast Secondary Index Creation, the secondary index entries are directly written to bottommost RocksDB levels and bypassing compaction. This significantly reduces total write volume and CPU time for decompressing and compressing data on higher levels.

    Loading data¶

    As described above, loading data is highly recommended for tables with primary key only (no secondary keys), with all secondary indexes added after loading data.

    When loading data into MyRocks tables, there are two recommended session variables:

    SET session sql_log_bin=0;
    SET session rocksdb_bulk_load=1;
    

    When converting from large MyISAM/InnoDB tables, either by using the ALTER or INSERT INTO SELECT statements it’s recommended that you create MyRocks tables as below (in case the table is sufficiently big it will cause the server to consume all the memory and then be terminated by the OOM killer):

    SET session sql_log_bin=0;
    SET session rocksdb_bulk_load=1;
    ALTER TABLE large_myisam_table ENGINE=RocksDB;
    SET session rocksdb_bulk_load=0;
    

    Using sql_log_bin=0 avoids writing to binary logs.

    With rocksdb_bulk_load set to 1, MyRocks enters special mode to write all inserts into bottommost RocksDB levels, and skips writing data into MemTable and the following compactions. This is very efficient way to load data.

    The rocksdb_bulk_load mode operates with a few conditions:

    • None of the data being bulk loaded can overlap with existing data in the table. The easiest way to ensure this is to always bulk load into an empty table, but the mode will allow loading some data into the table, doing other operations, and then returning and bulk loading addition data if there is no overlap between what is being loaded and what already exists.

    • The data may not be visible until bulk load mode is ended (i.e. the rocksdb_bulk_load is set to zero again). The method that is used is building up SST files which will later be added as-is to the database. Until a particular SST has been added the data will not be visible to the rest of the system, thus issuing a SELECT on the table currently being bulk loaded will only show older data and will likely not show the most recently added rows. Ending the bulk load mode will cause the most recent SST file to be added. When bulk loading multiple tables, starting a new table will trigger the code to add the most recent SST file to the system – as a result, it is inadvisable to interleave INSERT statements to two or more tables during bulk load mode.

    By default, the rocksdb_bulk_load mode expects all data be inserted in primary key order (or reversed order). If the data is in the reverse order (i.e. the data is descending on a normally ordered primary key or is ascending on a reverse ordered primary key), the rows are cached in chunks to switch the order to match the expected order.

    Inserting one or more rows out of order will result in an error and may result in some of the data being inserted in the table and some not. To resolve the problem, one can either fix the data order of the insert, truncate the table, and restart.

    Unsorted bulk loading¶

    If your data is not ordered in primary key order, then this method is recommended. With this method, secondary keys do not need to be dropped and restored. However, writing to the primary key no longer goes directly to SST files, and are written to temporary files for sorted first, so there is extra cost to this method.

    To allow for loading unsorted data:

    SET session sql_log_bin=0;
    SET session rocksdb_bulk_load_allow_unsorted=1;
    SET session rocksdb_bulk_load=1;
    ...
    SET session rocksdb_bulk_load=0;
    SET session rocksdb_bulk_load_allow_unsorted=0;
    

    Note that rocksdb_bulk_load_allow_unsorted can only be changed when rocksdb_bulk_load is disabled (set to 0). In this case, all input data will go through an intermediate step that writes the rows to temporary SST files, sorts them rows in the primary key order, and then writes to final SST files in the correct order.

    Other approaches¶

    If rocksdb_commit_in_the_middle is enabled, MyRocks implicitly commits every rocksdb_bulk_load_size records (default is 1,000) in the middle of your transaction. If your data loading fails in the middle of the statement (LOAD DATA or bulk INSERT), rows are not entirely rolled back, but some of rows are stored in the table. To restart data loading, you’ll need to truncate the table and loading data again.

    Warning

    If you are loading large data without enabling rocksdb_bulk_load or rocksdb_commit_in_the_middle, please make sure transaction size is small enough. All modifications of the ongoing transactions are kept in memory.

    Other reading¶

    • Data Loading - this document has been used as a source for writing this documentation

    • ALTER TABLE … ENGINE=ROCKSDB uses too much memory

    Contact us

    For free technical help, visit the Percona Community Forum.

    To report bugs or submit feature requests, open a JIRA ticket.

    For paid support and managed or consulting services , contact Percona Sales.


    Last update: 2023-01-12
    Percona LLC and/or its affiliates, © 2023
    Made with Material for MkDocs

    Cookie consent

    We use cookies to recognize your repeated visits and preferences, as well as to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better.