One of item I always look at SHOW ENGINE INNODB STATUS to see if there are any transactions spending very long time in ACTIVE state. In the perfect world if you’re running online system you should not see transactions spending more than couple of seconds in ACTIVE state. Especially ACTIVE transactions which do not currently run any query are suspicious. There are however cases when screaming fire about ACTIVE transactions alone would be misleading. There is a whole set of applications which run quite fine while having ACTIVE measured in hours. It is JAVA applications which often run in AUTOCOMMIT=0 mode and do not explicitly commit transactions unless there were any writes. If database is configured in READ-COMMITTED transaction mode it is actually fine from performance point of view as Innodb does not have to preserve row versions going back to start of transactions.
There is however a better metric, which in most cases will indicate the problem waiting to happen, this is when you have long enough ACTIVE transactions which have some locks, like in this case:
MySQL thread id 2059, query id 1014425 10.10.10.10 user
Trx read view will not see trx with id >= 2 1418401228, sees < 2 1418374947
---TRANSACTION 2 1418401057, ACTIVE 295 sec, process no 1098, OS thread id 1224980800
3 lock struct(s), heap size 368, 1 row lock(s), undo log entries 1
MySQL thread id 1720, query id 1011283 10.10.10.10 user
---TRANSACTION 2 1418397881, ACTIVE 317 sec, process no 1098, OS thread id 1229596992
3 lock struct(s), heap size 368, 1 row lock(s), undo log entries 2
In this case we see transactions are holding some locks and perform some update/deletes as there are undo log entries. This is bad because of 2 reasons. First if some other query will try to lock
the same rows it will have to wait… possibly until innodb_lock_wait_timeout elapses and when it completes with error. Second because if you have some updated rows Innodb will have
to maintain the read snapshot corresponding to this old transaction and so you can get system to slow down because of maintaining many old row versions.
I would prefer not to see transactions ACTIVE for very long time at all but the ones having locks is really the killer.
If you can’t fix your connection pool to rollback transactions when they are recycled it might be good idea to set wait_timeout to some reasonable value, say 60 seconds. In this case even if you get rouge transaction it will be killed quickly instead of stalling for potentially many hours.