TokuDB Background ANALYZE TABLE¶
Prior to 5.6.27-76.0
release there was no mechanism to optionally and automatically trigger a background analysis or gathering of cardinality statistics on a TokuDB tables. These statistics are used for JOIN
optimizations and helping the MySQL optimizer choose the appropriate index for a query. If a table statistics or index cardinality becomes outdated, you might see queries which previously performed well suddenly running much slower until statistics are updated again. For TokuDB cardinality statistics were only updated when an explicit ANALYZE TABLE
was issued.
Although this operation was an online in the sense that ANALYZE TABLE
within TokuDB didn’t explicitly lock anything MySQL itself would place a read lock on a table being analyzed, preventing writes while the analysis is in progress.
There was also no throttling or scope limiting mechanism for TokuDB ANALYZE TABLE
other than time limit (tokudb_analyze_time
5 second default) that it spends on each index in a table.
In Percona Server for MySQL 5.6.27-76.0
new behavior and variables have been implemented to have the tables automatically analyzed in the background based on a measured change in data. This has been done by implementing the new background job manager that can perform operations on a background thread.
Background Jobs¶
Background jobs and schedule are transient in nature and are not persisted anywhere. Any currently running job will be terminated on shutdown and all scheduled jobs will be forgotten about on server restart. There can’t be two jobs on the same table scheduled or running at any one point in time. If you manually invoke an ANALYZE TABLE
that conflicts with either a pending or running job, the running job will be canceled and the users task will run immediately in the foreground. All the scheduled and running background jobs can be viewed by querying the TOKUDB_BACKGROUND_JOB_STATUS
table.
New tokudb_analyze_in_background
variable has been implemented in order to control if the ANALYZE TABLE
will be dispatched to the background process or if it will be running in the foreground.
To control the function of ANALYZE TABLE
a new tokudb_analyze_mode
variable has been implemented. This variable offers options to cancel any running or scheduled job on the specified table (TOKUDB_ANALYZE_CANCEL
), use existing analysis algorithm (TOKUDB_ANALYZE_STANDARD
), or to recount the logical rows in table and update persistent count (TOKUDB_ANALYZE_RECOUNT_ROWS
).
TOKUDB_ANALYZE_RECOUNT_ROWS
is a new mechanism that is used to perform a logical recount of all rows in a table and persist that as the basis value for the table row estimate. This mode was added for tables that have been upgraded from an older version of TokuDB that only reported physical row counts and never had a proper logical row count. Newly created tables/partitions will begin counting logical rows correctly from their creation and should not need to be recounted unless some odd edge condition causes the logical count to become inaccurate over time. This analysis mode has no effect on the table cardinality counts. It will take the currently set session values for tokudb_analyze_in_background
, and tokudb_analyze_throttle
. Changing the global or session instances of these values after scheduling will have no effect on the job.
Any background job, both pending and running, can be canceled by setting the tokudb_analyze_mode
to TOKUDB_ANALYZE_CANCEL
and issuing the ANALYZE TABLE
on the table for which you want to cancel all the jobs for.
Auto analysis¶
To implement the background analysis and gathering of cardinality statistics on a TokuDB tables new delta
value is now maintained in memory for each TokuDB table. This value is not persisted anywhere and it is reset to 0
on a server start. It is incremented for each INSERT/UPDATE/DELETE
command and ignores the impact of transactions (rollback specifically). When this delta value exceeds the tokudb_auto_analyze
percentage of rows in the table an analysis is performed according to the current session’s settings. Other analysis for this table will be disabled until this analysis completes. When this analysis completes, the delta is reset to 0
to begin recalculating table changes for the next potential analysis.
Status values are now reported to server immediately upon completion of any analysis (previously new status values were not used until the table has been closed and re-opened). Half-time direction reversal of analysis has been implemented, meaning that if a tokudb_analyze_time
is in effect and the analysis has not reached the half way point of the index by the time tokudb_analyze_time
/2 has been reached: it will stop the forward progress and restart the analysis from the last/rightmost row in the table, progressing leftwards and keeping/adding to the status information accumulated from the first half of the scan.
For small ratios of table_rows
/ tokudb_auto_analyze
, auto analysis will be run for almost every change. The trigger formula is: if (table_delta >= ((table_rows * tokudb_auto_analyze) / 100))
then run ANALYZE TABLE
. If a user manually invokes an ANALYZE TABLE
and tokudb_auto_analyze
is enabled and there are no conflicting background jobs, the users ANALYZE TABLE
will behave exactly as if the delta level has been exceeded in that the analysis is executed and delta reset to 0
upon completion.
System Variables¶
-
variable
tokudb_analyze_in_background
¶ Command Line: Yes Config File: Yes Scope: Global/Session Dynamic: Yes Variable Type: Boolean Default Value: Off
When this variable is set to ON
it will dispatch any ANALYZE TABLE
job to a background process and return immediately, otherwise ANALYZE TABLE
will run in foreground/client context.
-
variable
tokudb_analyze_mode
¶ Command Line: Yes Config File: Yes Scope: Global/Session Dynamic: Yes Variable Type: ENUM Default Value: TOKUDB_ANALYZE_STANDARD
Range: TOKUDB_ANALYZE_CANCEL
,TOKUDB_ANALYZE_STANDARD
,TOKUDB_ANALYZE_RECOUNT_ROWS
This variable is used to control the function of ANALYZE TABLE
. Possible values are:
TOKUDB_ANALYZE_CANCEL
- Cancel any running or scheduled job on the specified table.TOKUDB_ANALYZE_STANDARD
- Use existing analysis algorithm. This is the standard table cardinality analysis mode used to obtain cardinality statistics for a tables and its indexes. It will take the currently set session values fortokudb_analyze_time
,tokudb_analyze_in_background
, andtokudb_analyze_throttle
at the time of its scheduling, either via a user invokedANALYZE TABLE
or an auto schedule as a result oftokudb_auto_analyze
threshold being hit. Changing the global or session instances of these values after scheduling will have no effect on the scheduled job.TOKUDB_ANALYZE_RECOUNT_ROWS
- Recount logical rows in table and update persistent count. This is a new mechanism that is used to perform a logical recount of all rows in a table and persist that as the basis value for the table row estimate. This mode was added for tables that have been upgraded from an older version of TokuDB/PerconaFT that only reported physical row counts and never had a proper logical row count. Newly created tables/partitions will begin counting logical rows correctly from their creation and should not need to be recounted unless some odd edge condition causes the logical count to become inaccurate over time. This analysis mode has no effect on the table cardinality counts. It will take the currently set session values fortokudb_analyze_in_background
, andtokudb_analyze_throttle
. Changing the global or session instances of these values after scheduling will have no effect on the job.
-
variable
tokudb_analyze_throttle
¶ Command Line: Yes Config File: Yes Scope: Global/Session Dynamic: Yes Variable Type: Numeric Default Value: 0
This variable is used to define maximum number of keys to visit per second when performing ANALYZE TABLE
with either a TOKUDB_ANALYZE_STANDARD
or TOKUDB_ANALYZE_RECOUNT_ROWS
.
-
variable
tokudb_analyze_time
¶ Command Line: Yes Config File: Yes Scope: Global/Session Dynamic: Yes Variable Type: Numeric Default Value: 5
This session variable controls the number of seconds an analyze operation will spend on each index when calculating cardinality. Cardinality is shown by executing the following command:
SHOW INDEXES FROM table_name;
If an analyze is never performed on a table then the cardinality is 1
for primary key indexes and unique secondary indexes, and NULL
(unknown) for all other indexes. Proper cardinality can lead to improved performance of complex SQL statements.
-
variable
tokudb_auto_analyze
¶ Command Line: Yes Config File: Yes Scope: Global/Session Dynamic: Yes Variable Type: Numeric Default Value: 0
Percentage of table change as INSERT/UPDATE/DELETE
commands to trigger an ANALYZE TABLE
using the current session tokudb_analyze_in_background
, tokudb_analyze_mode
, tokudb_analyze_throttle
, and tokudb_analyze_time
settings. If this variable is enabled and tokudb_analyze_in_background
variable is set to OFF
, analysis will be performed directly within the client thread context that triggered the analysis. NOTE: InnoDB enabled this functionality by default when they introduced it. Due to the potential unexpected new load it might place on a server, it is disabled by default in TokuDB.
-
variable
tokudb_cardinality_scale_percent
¶ Command Line: Yes Config File: Yes Scope: Global Dynamic: Yes Variable Type: Numeric Default Value: 50 Range: 0-100
Percentage to scale table/index statistics when sending to the server to make an index appear to be either more or less unique than it actually is. InnoDB has a hard coded scaling factor of 50%. So if a table of 200 rows had an index with 40 unique values, InnoDB would return 200/40/2 or 2 for the index. The new TokuDB formula is the same but factored differently to use percent, for the same table.index (200/40 * tokudb_cardinality_scale
) / 100, for a scale of 50% the result would also be 2 for the index.
INFORMATION_SCHEMA Tables¶
Note
If you’re upgrading an existing TokuDB installation prior to 5.6.27-76.0
, the TokuDB plugin must be disabled and re-enabled via ps_tokudb_admin to register this new table. Alternately the plugin can be enabled by running:
INSTALL PLUGIN tokudb_background_job_status SONAME 'ha_tokudb.so'
-
table
INFORMATION_SCHEMA.
TOKUDB_BACKGROUND_JOB_STATUS
¶ Columns: - id – Simple monotonically incrementing job id, resets to
0
on server start. - database_name – Database name
- table_name – Table name
- job_type – Type of job, either
TOKUDB_ANALYZE_STANDARD
orTOKUDB_ANALYZE_RECOUNT_ROWS
- job_params – Param values used by this job in string format. For example:
TOKUDB_ANALYZE_DELETE_TIME=1.0; TOKUDB_ANALYZE_TIME=5; TOKUDB_ANALYZE_THROTTLE=2048;
- scheduler – Either
USER
orAUTO
to indicate if the job was explicitly scheduled by a user or if it was scheduled as an automatic trigger - scheduled_time – The time the job was scheduled
- started_time – The time the job was started
- status – Current job status if running. For example:
ANALYZE TABLE standard db.tbl.idx 3 of 5 50% rows 10% time scanning forward
- id – Simple monotonically incrementing job id, resets to
This table holds the information on scheduled and running background ANALYZE TABLE
jobs for TokuDB tables.
Version Specific Information¶
5.6.27-76.0
: Feature implemented
Contact Us
For free technical help, visit the Percona Community Forum.To report bugs or submit feature requests, open a JIRA ticket.
For paid support and managed or professional services, contact Percona Sales.