Galera Error Failed to Report Last Committed (Interrupted System Call)

In this blog, we’ll discuss the ramifications of the Galera Error Failed to Report Last Committed (Interrupted System Call).

I have recently seen this error with Percona XtraDB Cluster (or Galera):

[Warning] WSREP: Failed to report last committed 549684236, -4 (Interrupted system call)

It was posted in launchpad as a bug in 2013: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1434646

My colleague Przemek replied, and explained it as:

Reporting the last committed transaction is just a part of the certification index purge process. In case it fails for some reason (it occasionally does), the cert index purge may be a little delayed. But it does not mean the transaction was not applied successfully. This is a warning after all.

If we look up this error in the source code, we realize it is reusing Linux system errors. Specifically:

#define EINTR 4 /* Interrupted system call */

As there isn’t much documentation regarding this error, and internet searches did not bring up useful information, my colleague David Bennett and I delved into the source code (as we do on occasion).

If we look in the Galera source code gcs_sm.hpp we see:

289  * @retval -EINTR  - was interrupted by another thread

We also see:

317                 /* was interrupted, will be handled by someone else */

This means that the thread was interrupted, but the server will retry on another thread. As it is just a warning, it isn’t anything to be too concerned about – unless they begin to pile up (which could be a sign of concurrency issues).

The specific warning is thrown from galera_service_thd.cpp here:

58                 if (gu_unlikely(ret < 0))
59                 {
60                     log_warn << "Failed to report last committed "
61                              << data.last_committed_ << ", " << ret
62                              << " (" << strerror (-ret) << ')';
63                     // @todo: figure out what to do in this case
64                 }

This warning could be handled better so as to not flood the logs, or sound cryptic enough to concern administrators.

Share this post

Comments (3)

  • Roel Van de Paar

    On PXC 5.7, using local sysbench, I see this message on a HDD:

    2016-08-09T09:43:58.131621Z 0 [Warning] WSREP: Failed to report last committed 71, -4 (Interrupted system call)
    2016-08-09T09:44:01.746062Z 0 [Warning] WSREP: Failed to report last committed 43, -4 (Interrupted system call)
    2016-08-09T09:44:04.974485Z 0 [Warning] WSREP: Failed to report last committed 45, -4 (Interrupted system call)
    2016-08-09T09:44:12.695912Z 0 [Warning] WSREP: Failed to report last committed 51, -4 (Interrupted system call)
    2016-08-09T09:44:16.046751Z 0 [Warning] WSREP: Failed to report last committed 53, -4 (Interrupted system call)

    The same does not happen on a SSD in the same server (otherwise all remaining equal)

    August 9, 2016 at 6:01 am
  • Vasiliy Petrov

    I had little different message
    [Warning] WSREP: Failed to report last committed 285293519, -110 (Connection timed out)
    What does it mean?

    February 6, 2018 at 1:24 am
    • Krunal Bauskar

      This simply means that the said node was unable to send the commit report notification to group channel probably due to heavy n/w traffic. It is again from same category and can be ignored but it also signals an important warning that you probably want to re-evaluate your load and available n/w bandwidth. Not that things will break immediately but if things keeps growing in this way you may see node dropping in future due to n/w issues.

      February 19, 2018 at 10:00 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.