Analyzing Table Statistics¶
The xtrabackup binary can analyze InnoDB data files in read-only mode to give
statistics about them. To do this, you should use the xtrabackup --stats
option. You can combine this with the xtrabackup --tables
option to limit the files to examine. It also uses the xtrabackup --use-memory
option.
You can perform the analysis on a running server, with some chance of errors due
to the data being changed during analysis. Or, you can analyze a backup copy of
the database. Either way, to use the statistics feature, you need a clean copy
of the database including correctly sized log files, so you need to execute with
xtrabackup --prepare
twice to use this functionality on a backup.
The result of running on a backup might look like the following:
<INDEX STATISTICS>
table: test/table1, index: PRIMARY, space id: 12, root page 3
estimated statistics in dictionary:
key vals: 25265338, leaf pages 497839, size pages 498304
real statistics:
level 2 pages: pages=1, data=5395 bytes, data/pages=32%
level 1 pages: pages=415, data=6471907 bytes, data/pages=95%
leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91%
This can be interpreted as follows:
-
The first line simply shows the table and index name and its internal identifiers. If you see an index named
GEN_CLUST_INDEX
, that is the table’s clustered index, automatically created because you did not explicitly create aPRIMARY KEY
. -
The estimated statistics in dictionary information is similar to the data that’s gathered through
ANALYZE TABLE
inside of InnoDB to be stored as estimated cardinality statistics and passed to the query optimizer. -
The real statistics information is the result of scanning the data pages and computing exact information about the index.
-
The level <X> pages
: output means that the line shows information about pages at that level in the index tree. The larger<X>
is, the farther it is from the leaf pages, which are level 0. The first line is the root page. -
The
leaf pages
output shows the leaf pages, of course. This is where the table’s data is stored. -
The
external pages
: output (not shown) shows large external pages that hold values too long to fit in the row itself, such as longBLOB
andTEXT
values. -
The
recs
is the real number of records (rows) in leaf pages. -
The
pages
is the page count. -
The
data
is the total size of the data in the pages, in bytes. -
The
data/pages
is calculated as (data
/ (pages
*PAGE_SIZE
)) * 100%. It will never reach 100% because of space reserved for page headers and footers.
Note
A more detailed example as a MySQL Performance blog post http://www.mysqlperformanceblog.com/2009/09/14/statistics-of-innodb-tables-and-indexes-available-in-xtrabackup/
Script to Format Output¶
The following script can be used to summarize and tabulate the output of the statistics information:
tabulate-xtrabackup-stats.pl
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
my $script_version = "0.1";
my $PG_SIZE = 16_384; # InnoDB defaults to 16k pages, change if needed.
my ($cur_idx, $cur_tbl);
my (%idx_stats, %tbl_stats);
my ($max_tbl_len, $max_idx_len) = (0, 0);
while ( my $line = <> ) {
if ( my ($t, $i) = $line =~ m/table: (.*), index: (.*), space id:/ ) {
$t =~ s!/!.!;
$cur_tbl = $t;
$cur_idx = $i;
if ( length($i) > $max_idx_len ) {
$max_idx_len = length($i);
}
if ( length($t) > $max_tbl_len ) {
$max_tbl_len = length($t);
}
}
elsif ( my ($kv, $lp, $sp) = $line =~ m/key vals: (\d+), \D*(\d+), \D*(\d+)/ ) {
@{$idx_stats{$cur_tbl}->{$cur_idx}}{qw(est_kv est_lp est_sp)} = ($kv, $lp, $sp);
$tbl_stats{$cur_tbl}->{est_kv} += $kv;
$tbl_stats{$cur_tbl}->{est_lp} += $lp;
$tbl_stats{$cur_tbl}->{est_sp} += $sp;
}
elsif ( my ($l, $pages, $bytes) = $line =~ m/(?:level (\d+)|leaf) pages:.*pages=(\d+), data=(\d+) bytes/ ) {
$l ||= 0;
$idx_stats{$cur_tbl}->{$cur_idx}->{real_pages} += $pages;
$idx_stats{$cur_tbl}->{$cur_idx}->{real_bytes} += $bytes;
$tbl_stats{$cur_tbl}->{real_pages} += $pages;
$tbl_stats{$cur_tbl}->{real_bytes} += $bytes;
}
}
my $hdr_fmt = "%${max_tbl_len}s %${max_idx_len}s %9s %10s %10s\n";
my @headers = qw(TABLE INDEX TOT_PAGES FREE_PAGES PCT_FULL);
printf $hdr_fmt, @headers;
my $row_fmt = "%${max_tbl_len}s %${max_idx_len}s %9d %10d %9.1f%%\n";
foreach my $t ( sort keys %tbl_stats ) {
my $tbl = $tbl_stats{$t};
printf $row_fmt, $t, "", $tbl->{est_sp}, $tbl->{est_sp} - $tbl->{real_pages},
$tbl->{real_bytes} / ($tbl->{real_pages} * $PG_SIZE) * 100;
foreach my $i ( sort keys %{$idx_stats{$t}} ) {
my $idx = $idx_stats{$t}->{$i};
printf $row_fmt, $t, $i, $idx->{est_sp}, $idx->{est_sp} - $idx->{real_pages},
$idx->{real_bytes} / ($idx->{real_pages} * $PG_SIZE) * 100;
}
}
Sample Script Output¶
The output of the above Perl script, when run against the sample shown in the previously mentioned blog post, will appear as follows:
TABLE INDEX TOT_PAGES FREE_PAGES PCT_FULL
art.link_out104 832383 38561 86.8%
art.link_out104 PRIMARY 498304 49 91.9%
art.link_out104 domain_id 49600 6230 76.9%
art.link_out104 domain_id_2 26495 3339 89.1%
art.link_out104 from_message_id 28160 142 96.3%
art.link_out104 from_site_id 38848 4874 79.4%
art.link_out104 revert_domain 153984 19276 71.4%
art.link_out104 site_message 36992 4651 83.4%
The columns are the table and index, followed by the total number of pages in
that index, the number of pages not actually occupied by data, and the number of
bytes of real data as a percentage of the total size of the pages of real
data. The first line in the above output, in which the INDEX
column is
empty, is a summary of the entire table.