.. program:: pt-find

==================
:program:`pt-find`
==================

NAME
====

:program:`pt-find` - Find MySQL tables and execute actions, like GNU find.

SYNOPSIS
========

Usage
-----

::

  pt-find [OPTIONS] [DATABASES]

:program:`pt-find` searches for MySQL tables and executes actions, like GNU find.  The
default action is to print the database and table name.

Find all tables created more than a day ago, which use the MyISAM engine, and
print their names:

.. code-block:: bash

   pt-find --ctime +1 --engine MyISAM

Find MyISAM tables and convert them to InnoDB:

.. code-block:: bash

   pt-find --engine MyISAM --exec "ALTER TABLE %D.%N ENGINE=InnoDB"

Find tables created by a process that no longer exists, following the
name_sid_pid naming convention, and remove them.

.. code-block:: bash

   pt-find --connection-id '\D_\d+_(\d+)$' --server-id '\D_(\d+)_\d+$' --exec-plus "DROP TABLE %s"

Find empty tables in the test and junk databases, and delete them:

.. code-block:: bash

   pt-find --empty junk test --exec-plus "DROP TABLE %s"

Find tables more than five gigabytes in total size:

.. code-block:: bash

   pt-find --tablesize +5G

Find all tables and print their total data and index size, and sort largest
tables first (sort is a different program, by the way).

.. code-block:: bash

   pt-find --printf "%T\t%D.%N\n" | sort -rn

As above, but this time, insert the data back into the database for posterity:

.. code-block:: bash

   pt-find --noquote --exec "INSERT INTO sysdata.tblsize(db, tbl, size) VALUES('%D', '%N', %T)"

RISKS
=====

Percona Toolkit is mature, proven in the real world, and well tested,
but all database tools can pose a risk to the system and the database
server.  Before using this tool, please:

* Read the tool's documentation

* Review the tool's known "BUGS"

* Test the tool on a non-production server

* Backup your production server and verify the backups

DESCRIPTION
===========

:program:`pt-find` looks for MySQL tables that pass the tests you specify, and executes
the actions you specify.  The default action is to print the database and table
name to STDOUT.

:program:`pt-find` is simpler than GNU find.  It doesn't allow you to specify
complicated expressions on the command line.

:program:`pt-find` uses SHOW TABLES when possible, and SHOW TABLE STATUS when needed.

OPTION TYPES
============

There are three types of options: normal options, which determine some behavior
or setting; tests, which determine whether a table should be included in the
list of tables found; and actions, which do something to the tables :program:`pt-find`
finds.

:program:`pt-find` uses standard Getopt::Long option parsing, so you should use double
dashes in front of long option names, unlike GNU find.

OPTIONS
=======

This tool accepts additional command-line arguments.  Refer to the
"SYNOPSIS" and usage information for details.

.. option:: --ask-pass

 Prompt for a password when connecting to MySQL.

.. option:: --[no]buffer-stdout

 default: yes

 Enables STDOUT buffering. Disable this option if you want to see live progress
 when using output post-processing tools such as ``tee`` or ``kubectl logs``.

.. option:: --case-insensitive

 Specifies that all regular expression searches are case-insensitive.

.. option:: --charset

 short form: -A; type: string

 Default character set.  If the value is utf8, sets Perl's binmode on
 STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET
 NAMES UTF8 after connecting to MySQL.  Any other value sets binmode on STDOUT
 without the utf8 layer, and runs SET NAMES after connecting to MySQL.

.. option:: --config

 type: Array

 Read this comma-separated list of config files; if specified, this must be the
 first option on the command line.

.. option:: --database

 short form: -D; type: string

 Connect to this database.

.. option:: --day-start

 Measure times (for :option:`--mmin`, etc) from the beginning of today rather than
 from the current time.

.. option:: --defaults-file

 short form: -F; type: string

 Only read mysql options from the given file.  You must give an absolute
 pathname.

.. option:: --help

 Show help and exit.

.. option:: --host

 short form: -h; type: string

 Connect to host.

.. option:: --mysql_ssl

 short form: -s; type: int

 Create SSL MySQL connection.

.. option:: --or

 Combine tests with OR, not AND.

 By default, tests are evaluated as though there were an AND between them.  This
 option switches it to OR.

 Option parsing is not implemented by :program:`pt-find` itself, so you cannot specify
 complicated expressions with parentheses and mixtures of OR and AND.

.. option:: --password

 short form: -p; type: string

 Password to use when connecting.
 If password contains commas they must be escaped with a backslash: "exam\,ple"

.. option:: --pid

 type: string

 Create the given PID file.  The tool won't start if the PID file already
 exists and the PID it contains is different than the current PID.  However,
 if the PID file exists and the PID it contains is no longer running, the
 tool will overwrite the PID file with the current PID.  The PID file is
 removed automatically when the tool exits.

.. option:: --port

 short form: -P; type: int

 Port number to use for connection.

.. option:: --[no]quote

 default: yes

 Quotes MySQL identifier names with MySQL's standard backtick character.

 Quoting happens after tests are run, and before actions are run.

.. option:: --set-vars

 type: Array

 Set the MySQL variables in this comma-separated list of ``variable=value`` pairs.

 By default, the tool sets:

 .. code-block:: bash

     wait_timeout=10000

 Variables specified on the command line override these defaults.  For
 example, specifying ``--set-vars wait_timeout=500`` overrides the defaultvalue of ``10000``.

 The tool prints a warning and continues if a variable cannot be set.

.. option:: --socket

 short form: -S; type: string

 Socket file to use for connection.

.. option:: --user

 short form: -u; type: string

 User for login if not current user.

.. option:: --version

 Show version and exit.

.. option:: --[no]version-check

 default: yes

 Check for the latest version of Percona Toolkit, MySQL, and other programs.

 This is a standard "check for updates automatically" feature, with two
 additional features.  First, the tool checks its own version and also the
 versions of the following software: operating system, Percona Monitoring and
 Management (PMM), MySQL, Perl, MySQL driver for Perl (DBD::mysql), and
 Percona Toolkit. Second, it checks for and warns about versions with known
 problems. For example, MySQL 5.5.25 had a critical bug and was re-released
 as 5.5.25a.

 A secure connection to Percona’s Version Check database server is done to
 perform these checks. Each request is logged by the server, including software
 version numbers and unique ID of the checked system. The ID is generated by the
 Percona Toolkit installation script or when the Version Check database call is
 done for the first time.

 Any updates or known problems are printed to STDOUT before the tool's normal
 output.  This feature should never interfere with the normal operation of the
 tool.

 For more information, visit `https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>`_.

TESTS
=====

Most tests check some criterion against a column of SHOW TABLE STATUS output.
Numeric arguments can be specified as +n for greater than n, -n for less than n,
and n for exactly n.  All numeric options can take an optional suffix multiplier
of k, M or G (1_024, 1_048_576, and 1_073_741_824 respectively).  All patterns
are Perl regular expressions (see 'man perlre') unless specified as SQL LIKE
patterns.

Dates and times are all measured relative to the same instant, when :program:`pt-find`
first asks the database server what time it is.  All date and time manipulation
is done in SQL, so if you say to find tables modified 5 days ago, that
translates to SELECT DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 5 DAY).  If you
specify :option:`--day-start`, if course it's relative to CURRENT_DATE instead.

However, table sizes and other metrics are not consistent at an instant in
time.  It can take some time for MySQL to process all the SHOW queries, and
:program:`pt-find` can't do anything about that.  These measurements are as of the
time they're taken.

If you need some test that's not in this list, file a feature request at
`https://jira.percona.com/projects/PT <https://jira.percona.com/projects/PT>`_.

.. option:: --autoinc

 type: string; group: Tests

 Table's next AUTO_INCREMENT is n.  This tests the Auto_increment column.

.. option:: --avgrowlen

 type: size; group: Tests

 Table avg row len is n bytes.  This tests the Avg_row_length column.
 The specified size can be "NULL" to test where Avg_row_length IS NULL.

.. option:: --checksum

 type: string; group: Tests

 Table checksum is n.  This tests the Checksum column.

.. option:: --cmin

 type: size; group: Tests

 Table was created n minutes ago.  This tests the Create_time column.

.. option:: --collation

 type: string; group: Tests

 Table collation matches pattern.  This tests the Collation column.

.. option:: --column-name

 type: string; group: Tests

 A column name in the table matches pattern.

.. option:: --column-type

 type: string; group: Tests

 A column in the table matches this type (case-insensitive).

 Examples of types are: varchar, char, int, smallint, bigint, decimal, year,
 timestamp, text, enum.

.. option:: --comment

 type: string; group: Tests

 Table comment matches pattern.  This tests the Comment column.

.. option:: --connection-id

 type: string; group: Tests

 Table name has nonexistent MySQL connection ID.  This tests the table name for
 a pattern.  The argument to this test must be a Perl regular expression that
 captures digits like this: (\d+).  If the table name matches the pattern,
 these captured digits are taken to be the MySQL connection ID of some process.
 If the connection doesn't exist according to SHOW FULL PROCESSLIST, the test
 returns true.  If the connection ID is greater than :program:`pt-find`'s own
 connection ID, the test returns false for safety.

 Why would you want to do this?  If you use MySQL statement-based replication,
 you probably know the trouble temporary tables can cause.  You might choose to
 work around this by creating real tables with unique names, instead of
 temporary tables.  One way to do this is to append your connection ID to the
 end of the table, thusly: scratch_table_12345.  This assures the table name is
 unique and lets you have a way to find which connection it was associated
 with.  And perhaps most importantly, if the connection no longer exists, you
 can assume the connection died without cleaning up its tables, and this table
 is a candidate for removal.

 The argument to use for :option:`--connection-id` is "\D_(\d+)$".  That finds tables
 with a series of numbers at the end, preceded by an underscore and some
 non-number character (the latter criterion prevents from examining tables
 with a date at the end, which people tend to do: baron_scratch_2007_05_07 for
 example).  It's better to keep the scratch tables separate.

 If you do this, make sure the user :program:`pt-find` runs as has the PROCESS privilege!
 Otherwise it will only see connections from the same user, and might think some
 tables are ready to remove when they're still in use.  For safety, :program:`pt-find`
 checks this for you.

 See also :option:`--server-id`.

.. option:: --createopts

 type: string; group: Tests

 Table create option matches pattern.  This tests the Create_options column.

.. option:: --ctime

 type: size; group: Tests

 Table was created n days ago.  This tests the Create_time column.

.. option:: --datafree

 type: size; group: Tests

 Table has n bytes of free space.  This tests the Data_free column.
 The specified size can be "NULL" to test where Data_free IS NULL.

.. option:: --datasize

 type: size; group: Tests

 Table data uses n bytes of space.  This tests the Data_length column.
 The specified size can be "NULL" to test where Data_length IS NULL.

 Note: Starting from MySQL 8.0, empty tables return 0 instead of NULL.

.. option:: --dblike

 type: string; group: Tests

 Database name matches SQL LIKE pattern.

.. option:: --dbregex

 type: string; group: Tests

 Database name matches this pattern.

.. option:: --empty

 group: Tests

 Table has no rows.  This tests the Rows column.

.. option:: --engine

 type: string; group: Tests

 Table storage engine matches this pattern.  This tests the Engine column, or in
 earlier versions of MySQL, the Type column.

.. option:: --function

 type: string; group: Tests

 Function definition matches pattern.

.. option:: --indexsize

 type: size; group: Tests

 Table indexes use n bytes of space.  This tests the Index_length column.
 The specified size can be "NULL" to test where Index_length IS NULL.

.. option:: --kmin

 type: size; group: Tests

 Table was checked n minutes ago.  This tests the Check_time column.

.. option:: --ktime

 type: size; group: Tests

 Table was checked n days ago.  This tests the Check_time column.

.. option:: --mmin

 type: size; group: Tests

 Table was last modified n minutes ago.  This tests the Update_time column.

.. option:: --mtime

 type: size; group: Tests

 Table was last modified n days ago.  This tests the Update_time column.

.. option:: --procedure

 type: string; group: Tests

 Procedure definition matches pattern.

.. option:: --rowformat

 type: string; group: Tests

 Table row format matches pattern.  This tests the Row_format column.

.. option:: --rows

 type: size; group: Tests

 Table has n rows.  This tests the Rows column.
 The specified size can be "NULL" to test where Rows IS NULL.

.. option:: --server-id

 type: string; group: Tests

 Table name contains the server ID.  If you create temporary tables with the
 naming convention explained in :option:`--connection-id`, but also add the server ID of the
 server on which the tables are created, then you can use this pattern match to
 ensure tables are dropped only on the server they're created on.  This prevents
 a table from being accidentally dropped on a replica while it's in use (provided
 that your server IDs are all unique, which they should be for replication to
 work).

 For example, on the replication source (server ID 22) you create a table called
 scratch_table_22_12345.  If you see this table on the replica (server ID 23), you
 might think it can be dropped safely if there's no such connection 12345.  But
 if you also force the name to match the server ID with ``--server-id '\D_(\d+)_\d+$'``,
 the table won't be dropped on the replica.

.. option:: --tablesize

 type: size; group: Tests

 Table uses n bytes of space.  This tests the sum of the Data_length and
 Index_length columns.

.. option:: --tbllike

 type: string; group: Tests

 Table name matches SQL LIKE pattern.

.. option:: --tblregex

 type: string; group: Tests

 Table name matches this pattern.

.. option:: --tblversion

 type: size; group: Tests

 Table version is n.  This tests the Version column.

.. option:: --trigger

 type: string; group: Tests

 Trigger action statement matches pattern.

.. option:: --trigger-table

 type: string; group: Tests

 :option:`--trigger` is defined on table matching pattern.

.. option:: --view

 type: string; group: Tests

 CREATE VIEW matches this pattern.

ACTIONS
=======

The :option:`--exec-plus` action happens after everything else, but otherwise actions
happen in an indeterminate order.  If you need determinism, file a feature request
at `https://jira.percona.com/projects/PT <https://jira.percona.com/projects/PT>`_.

.. option:: --exec

 type: string; group: Actions

 Execute this SQL with each item found.  The SQL can contain escapes and
 formatting directives (see :option:`--printf`).

.. option:: --exec-dsn

 type: string; group: Actions

 Specify a DSN in key-value format to use when executing SQL with :option:`--exec` and
 :option:`--exec-plus`.  Any values not specified are inherited from command-line
 arguments.

.. option:: --exec-plus

 type: string; group: Actions

 Execute this SQL with all items at once.  This option is unlike :option:`--exec`.  There
 are no escaping or formatting directives; there is only one special placeholder
 for the list of database and table names, %s.  The list of tables found will be
 joined together with commas and substituted wherever you place %s.

 You might use this, for example, to drop all the tables you found:

 .. code-block:: bash

     DROP TABLE %s

 This is sort of like GNU find's "-exec command {} +" syntax.

.. option:: --print

 group: Actions

 Print the database and table name, followed by a newline.  This is the default
 action if no other action is specified.

.. option:: --printf

 type: string; group: Actions

 Print format on the standard output, interpreting '\' escapes and '%'
 directives.  Escapes are backslashed characters, like \n and \t.  Perl
 interprets these, so you can use any escapes Perl knows about.  Directives are
 replaced by %s, and as of this writing, you can't add any special formatting
 instructions, like field widths or alignment.

 Here is a list of the directives.  Note that most of them simply come from
 columns of SHOW TABLE STATUS.  If the column is NULL or doesn't exist, you get
 an empty string in the output.  A % character followed by any character not in
 the following list is discarded (but the other character is printed).

 .. code-block:: bash

     CHAR DATA SOURCE        NOTES
     ---- ------------------ ------------------------------------------
     a    Auto_increment
     A    Avg_row_length
     c    Checksum
     C    Create_time
     D    Database           The database name in which the table lives
     d    Data_length
     E    Engine             In older versions of MySQL, this is Type
     F    Data_free
     f    Innodb_free        Parsed from the Comment field
     I    Index_length
     K    Check_time
     L    Collation
     M    Max_data_length
     N    Name
     O    Comment
     P    Create_options
     R    Row_format
     S    Rows
     T    Table_length       Data_length+Index_length
     U    Update_time
     V    Version


DSN OPTIONS
===========

These DSN options are used to create a DSN.  Each option is given like
``option=value``.  The options are case-sensitive, so P and p are not the
same option.  There cannot be whitespace before or after the ``=`` and
if the value contains whitespace it must be quoted.  DSN options are
comma-separated.  See the percona-toolkit manpage for full details.

* A

 dsn: charset; copy: yes

 Default character set.

* D

 dsn: database; copy: yes

 Default database.

* F

 dsn: mysql_read_default_file; copy: yes

 Only read default options from the given file

* h

 dsn: host; copy: yes

 Connect to host.

* p

 dsn: password; copy: yes

 Password to use when connecting.
 If password contains commas they must be escaped with a backslash: "exam\,ple"

* P

 dsn: port; copy: yes

 Port number to use for connection.

* S

 dsn: mysql_socket; copy: yes

 Socket file to use for connection.

* u

 dsn: user; copy: yes

 User for login if not current user.

* s

 dsn: mysql_ssl; copy: yes

 Create SSL connection

ENVIRONMENT
===========

The environment variable ``PTDEBUG`` enables verbose debugging output to STDERR.
To enable debugging and capture all output to a file, run the tool like:

.. code-block:: bash

    PTDEBUG=1 pt-find ... > FILE 2>&1

Be careful: debugging output is voluminous and can generate several megabytes
of output.

ATTENTION
=========

Using <PTDEBUG> might expose passwords. When debug is enabled, all command line
parameters are shown in the output.

SYSTEM REQUIREMENTS
===================

You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.

BUGS
====

For a list of known bugs, see `https://jira.percona.com/projects/PT/issues <https://jira.percona.com/projects/PT/issues>`_.

Please report bugs at `https://jira.percona.com/projects/PT <https://jira.percona.com/projects/PT>`_.
Include the following information in your bug report:

* Complete command-line used to run the tool

* Tool :option:`--version`

* MySQL version of all servers involved

* Output from the tool including STDERR

* Input files (log/dump/config files, etc.)

If possible, include debugging output by running the tool with ``PTDEBUG``;
see "ENVIRONMENT".

DOWNLOADING
===========

Visit `http://www.percona.com/software/percona-toolkit/ <http://www.percona.com/software/percona-toolkit/>`_ to download the
latest release of Percona Toolkit.  Or, get the latest release from the
command line:

.. code-block:: bash

    wget percona.com/get/percona-toolkit.tar.gz

    wget percona.com/get/percona-toolkit.rpm

    wget percona.com/get/percona-toolkit.deb

You can also get individual tools from the latest release:

.. code-block:: bash

    wget percona.com/get/TOOL

Replace ``TOOL`` with the name of any tool.

AUTHORS
=======

Baron Schwartz

ABOUT PERCONA TOOLKIT
=====================

This tool is part of Percona Toolkit, a collection of advanced command-line
tools for MySQL developed by Percona.  Percona Toolkit was forked from two
projects in June, 2011: Maatkit and Aspersa.  Those projects were created by
Baron Schwartz and primarily developed by him and Daniel Nichter.  Visit
`http://www.percona.com/software/ <http://www.percona.com/software/>`_ to learn about other free, open-source
software from Percona.

COPYRIGHT, LICENSE, AND WARRANTY
================================

This program is copyright 2011-2024 Percona LLC and/or its affiliates,
2007-2011 Baron Schwartz.

THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License.  On UNIX and similar
systems, you can issue \`man perlgpl' or \`man perlartistic' to read these
licenses.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA  02111-1307  USA.

VERSION
=======

:program:`pt-find` 3.7.1