Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MySQL – A Series of Bad Design Decisions

January 8, 2020

Author

Peter Zaitsev

MySQL

Share this Post:

Bad Design MySQL MySQL obviously got many things right, otherwise, it would not be the World’s Most Popular Open Source Database (according to DB-Engines). Sometimes, however, I run into some decisions or behaviors which are just plain bad designs. Many such designs have a lot of historical reasoning behind them and maybe they are still here because not enough resources are allocated to cleaning up technical debt.

I’m passionate about observability, especially when it comes to understanding system performance. One of the most important pieces of data to understand MySQL Performance is understanding its latches contention (mutexes, rwlocks, etc).

The “best” way to understand latches in MySQL is Performance Schema. Unfortunately latching profiling is disabled by default in Performance Schema because it causes quite a significant overhead; significant enough you likely will not be running with this instrumentation in production at all times.

If you’re looking to get some always available mutex information from MySQL, you can get them from the InnoDB storage engine (which is often good enough, as this is where most contention happens).

One choice is to look into SHOW ENGINE INNODB STATUS output – particularly in SEMAPHORES section:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 490020789
--Thread 140407582807808 has waited at row0ins.cc line 2412 for 0 seconds the semaphore:
S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785
a writer (thread id 140407910762240) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 2412
Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142
--Thread 140386577712896 has waited at row0ins.cc line 2412 for 0 seconds the semaphore:
S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785
a writer (thread id 140407910762240) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 2412
Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 490020789

--Thread 140407582807808 has waited at row0ins.cc line 2412 for 0 seconds the semaphore:

S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785

a writer (thread id 140407910762240) has reserved it in mode exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time read locked in file row0ins.cc line 2412

Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142

--Thread 140386577712896 has waited at row0ins.cc line 2412 for 0 seconds the semaphore:

S-lock on RW-latch at 0x7fa0159bd6f0 created in file buf0buf.cc line 785

a writer (thread id 140407910762240) has reserved it in mode exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time read locked in file row0ins.cc line 2412

Last time write locked in file /mnt/workspace/percona-server-8.0-debian-binary/label_exp/min-bionic-x64/test/percona-server-8.0.18-9/storage/innobase/include/mtr0mtr.ic line 142

This section will provide information on mutex which are being waited for along with wait time information which can be very helpful. Unfortunately, this information is provided in the form which is not easily parsable and only can be retrieved with whole SHOW ENGINE INNODB STATUS output, causing extra load and making it a poor fit for high-frequency sampling.

Why this information was never made accessible with some INFORMATION_SCHEMA table is a great puzzle. Was it because the idea was that PERFORMANCE_SCHEMA should be the one and only tool for Observability, even when the MySQL engineering team can’t get it to perform with acceptable overhead?

But wait, you say… there is a better way – you can use SHOW ENGINE INNODB MUTEX to get a summary of the stats:

mysql> SHOW ENGINE INNODB MUTEX;
+--------+----------------------------+-----------------+
| Type   | Name                       | Status          |
+--------+----------------------------+-----------------+
| InnoDB | rwlock: dict0dict.cc:2454  | waits=138       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=545       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=124       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=110       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=134       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=132       |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=5317      |
| InnoDB | rwlock: dict0dict.cc:2454  | waits=538       |

…
| InnoDB | rwlock: hash0hash.cc:171   | waits=219       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=291       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=290       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=312       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=281       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=226       |
| InnoDB | rwlock: hash0hash.cc:171   | waits=327       |
| InnoDB | sum rwlock: buf0buf.cc:785 | waits=138699546 |
+--------+----------------------------+-----------------+
332 rows in set (0.59 sec)

mysql> SHOW ENGINE INNODB MUTEX;

+--------+----------------------------+-----------------+

| Type | Name | Status |

+--------+----------------------------+-----------------+

| InnoDB | rwlock: dict0dict.cc:2454 | waits=138 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=545 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=124 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=110 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=134 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=132 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=5317 |

| InnoDB | rwlock: dict0dict.cc:2454 | waits=538 |

…

| InnoDB | rwlock: hash0hash.cc:171 | waits=219 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=291 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=290 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=312 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=281 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=226 |

| InnoDB | rwlock: hash0hash.cc:171 | waits=327 |

| InnoDB | sum rwlock: buf0buf.cc:785 | waits=138699546 |

+--------+----------------------------+-----------------+

332 rows in set (0.59 sec)

This command does not provide the same information (it shows the number of waits, not current waits) but it is helpful. The problem with this command is it looks like it was specially designed to be as least-useful as possible; see there are a lot of duplicates at “Name”. This is because there are multiple instances of the same kind of mutex. In many cases, when you want to understand what kind of contention you’re dealing with, you would like to SUM the number of Waits grouping it by Name, and unfortunately, you can’t do that with SHOW commands.

Even more strange is the choice of waits=N syntax and naming a column “Status” where relational database design would suggest using “Waits” as a column name instead.

I also would prefer to see sync object name here, not the source code line as it is usually a lot more descriptive. MariaDB does it, BTW, and it also makes it available as innodb_mutex information schema table.

Finally, note how slow (read: expensive) this command is: 0.6 seconds on an 80GB buffer pool. The reason is it captures mutex contention on buffer pool pages, which is super helpful at identifying page-specific contention but also requires aggregating information for potentially millions of objects which takes time.

I think the INFORMATION_SCHEMA table would be a much better choice to show this information too.

OK, so we can’t run plain and simple SELECT on INFORMATION_SCHEMA table to get the data in an easily digestible form, but maybe we should write a stored procedure instead ?

This brings us to another design problem. While you can iterate SELECT output in a stored procedure easily, it does not work for SHOW commands. There probably was a good practical reason for this limitation, but it is not user-friendly at all.

When nothing else helps there is always Shell scripting, and we can use it to solve this problem too:

root@rocky:~# mysql -BNe "SHOW ENGINE INNODB MUTEX" | awk -F't' '{split($3, waits, "="); out[$2]+=waits[2];} END { for(el in out) printf "%st%dn", el, out[el] } '
sum rwlock: buf0buf.cc:785      138984320
rwlock: btr0sea.cc:202  226767645
rwlock: trx0purge.cc:222        13
rwlock: ibuf0ibuf.cc:543        345822
rwlock: dict0dict.cc:2454       1958468
rwlock: dict0dict.cc:1042       66610
rwlock: fil0fil.cc:3150 1064444
rwlock: hash0hash.cc:171        37536
rwlock: dict0dict.cc:330        131

root@rocky:~# mysql -BNe "SHOW ENGINE INNODB MUTEX" | awk -F't' '{split($3, waits, "="); out[$2]+=waits[2];} END { for(el in out) printf "%st%dn", el, out[el] } '

sum rwlock: buf0buf.cc:785 138984320

rwlock: btr0sea.cc:202 226767645

rwlock: trx0purge.cc:222 13

rwlock: ibuf0ibuf.cc:543 345822

rwlock: dict0dict.cc:2454 1958468

rwlock: dict0dict.cc:1042 66610

rwlock: fil0fil.cc:3150 1064444

rwlock: hash0hash.cc:171 37536

rwlock: dict0dict.cc:330 131

However, if your database requires you to do that for basic grouping of the information it provides, there is something wrong here!

What would I like to see? I believe all SHOW statements should be reviewed and if they are not planned for deprecation, information similar to what they provide should be made available from tables or views. In fact, this work was already done for most common commands, but it looks like it was never completed.

0 0 votes

Article Rating

6 Comments

Oldest

Newest Most Voted

Morgan Tocker (@morgo)

6 years ago

I agree with your summary that this information should be available via tables or views. The intention with SHOW ENGINE INNODB MUTEX was actually to remove it:

http://www.tocker.ca/2014/03/05/a-followup-on-show-engine-innodb-mutex.html
http://www.tocker.ca/2015/06/25/show-engine-innodb-mutex-is-back.html

.. but it turned out to be popular, so it was re-added. So I respectfully disagree on your summary 🙂

I think the lesson is “if you expose an interface, be aware that you may be stuck maintaining it forever” *not* “there are not enough resources dedicated to cleaning up technical debt.”

In response to this question:

> Why this information was never made accessible with some INFORMATION_SCHEMA table is a great puzzle. Was it because the idea was that PERFORMANCE_SCHEMA should be the one and only tool for Observability, even when the MySQL engineering team can’t get it to perform with acceptable overhead?

information_schema is intended to be for more static meta data. Fast changing meta data is intended for performance schema. There are some historical violations, but you can see that they are being moved across. i.e. show_compatibility_56: https://dev.mysql.com/doc/refman/5.7/en/performance-schema-variable-table-migration.html

Why INFORMATION_SCHEMA is not suited for fast changing meta data is perhaps partially historical (since in 8.0 most(?) i_s tables are now views). Performance schema is implemented using the engine interface, so it is able to add indexes, and take advantage of general improvements to execution. There are more details in this post: https://mysqlserverteam.com/mysql-8-0-performance-schema-now-with-indexes/

Author

Peter Zaitsev

6 years ago

Reply to Morgan Tocker (@morgo)

Hi Morgan, Great to hear from you!

Frankly from practical standpoint I do not care too much if information is available in Performance Schema or Information Schema… But just like in Unix “everything is a file”, everything should be a table in relational database to ensure all power of language can be used to access and process this data.

Many SHOW commands got their tables, such as “SHOW PROCESSLIST” this command for some reason did not… In MySQL at Least. It did in MariaDB

Praji Venu

6 years ago

Nice article Peter. This question may be out of place , but let me anyway ask, How do we know which table or object(s) are causing the contention in buffer pool ?

Author

Peter Zaitsev

6 years ago

Reply to Praji Venu

There is no direct way to see it what I know. I would see at the wait statistics on the table/index access from Performance Schema. The time of object access will have IO/CPU/Contention combined but you can substract CPU based on file IO stats. With some idea about what CPU usage should be based on the row ops you can guess what objects specifically cause contention