One of the most common tasks for database administrators is checking logs; you can either work directly with the file or process it using another tool. Either way, regularly checking the logs remains essential.
Within this context, certain log messages occasionally begin to appear, yet, unfortunately, there isn’t much literature on some of them; that’s because neither the community nor the documentation offers much further explanation.
This article aims to explain the message of ‘RSM not processing response’ more deeply and provide a more solid foundation for understanding it.
We will break this article into three sections:
First things first, let’s take a look at a typical instance of the error we’re discussing:
|
1 |
{"t":{"$date":"2024-03-05T08:33:41.685-03:00"},"s":"I", "c":"-", "id":4495400, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM not processing response","attr":{"error":{"code":0,"codeName":"OK"},"replicaSet":"rs0"}} |
Before jumping into those sections, let’s break down that message and explain what mongo.log is trying to communicate to us.
As we’ve seen in the breakdown of the log, this message is associated with the ReplicaSetMonitor-TaskExecutor thread. The ReplicaSetMonitor(RSM) is a critical component in MongoDB, tasked with tracking the status and configuration of the replica set. When the log says ‘RSM not processing response,’ it suggests an issue in how RSM is handling or reacting to responses received during its monitoring activities; from the documentation, we have more tails on the goal behind the ReplicaSetMonitor(RSM):
This monitor/discovery is made via isMaster/hello operation.
What is happening here is that your node got a response from that isMaster/hello, but it’s in a shutdown state. Thus, it can’t process the response.
It’s also important to mention that MongoDB uses two methods for this tracking process: the ‘sdam’ (Server Discovery and Monitoring) and the ‘streamable’ method. ‘SDAM’ is about MongoDB figuring out the layout and status of its databases, like who’s in charge and who’s backing up. The ‘streamable’ method is preferred for its efficiency in keeping an eye on changes in the database setup. It will learn much sooner about stepdowns, elections, reconfigs, and other events. This method smartly maintains up-to-date connections with each database in the set without overloading the system, enhancing the effectiveness of the RSM.
|
1 |
StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleStreamableHello() {<br> auto maxAwaitTimeMS = durationCount<Milliseconds>(kMaxAwaitTime);<br> overrideMaxAwaitTimeMS.execute([&](const BSONObj& data) {<br> maxAwaitTimeMS =<br> durationCount<Milliseconds>(Milliseconds(data["maxAwaitTimeMS"].numberInt()));<br> });<br><br> BSONObjBuilder bob;<br> bob.append("hello", 1);<br> bob.append("maxAwaitTimeMS", maxAwaitTimeMS);<br> bob.append("topologyVersion", _topologyVersion->toBSON());<br><br> WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);<br><br> const auto timeoutMS = _connectTimeout + kMaxAwaitTime;<br> auto request = executor::RemoteCommandRequest(<br> HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, timeoutMS);<br> request.sslMode = _setUri.getSSLMode();<br><br> auto swCbHandle = _executor->scheduleExhaustRemoteCommand(<br> request,<br> [self = shared_from_this(), helloStats = _stats->collectHelloStats()](<br> const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {<br> Milliseconds nextRefreshPeriod;<br> {<br> stdx::lock_guard lk(self->_mutex);<br><br> if (self->_isShutdown) {<br> self->_helloOutstanding = false;<br> LOGV2_DEBUG(4495400,<br> kLogLevel,<br> "RSM not processing response",<br> "error"_attr = result.response.status,<br> "replicaSet"_attr = self->_setUri.getSetName());<br> return;<br> } |
Also:
|
1 |
StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleSingleHello() {<br>BSONObjBuilder bob;<br>bob.append("hello", 1);<br><br>WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);<br><br>auto request = executor::RemoteCommandRequest(<br>HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, _connectTimeout);<br>request.sslMode = _setUri.getSSLMode();<br><br>auto swCbHandle = _executor->scheduleRemoteCommand(<br>request,<br>[self = shared_from_this(), helloStats = _stats->collectHelloStats()](<br>const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {<br>Milliseconds nextRefreshPeriod;<br>{<br>stdx::lock_guard lk(self->_mutex);<br>self->_helloOutstanding = false;<br><br>if (self->_isShutdown) {<br>LOGV2_DEBUG(4333219,<br>kLogLevel,<br>"RSM not processing response",<br>"error"_attr = result.response.status,<br>"replicaSet"_attr = self->_setUri.getSetName());<br>return;<br>} |
On both functions responsible for ReplicaSet Monitor:
We have the if (self->_isShutdown) condition. When this condition is true, indicating that the ReplicaSetMonitor(RSM) is in a shutdown state, it triggers a log entry with the message ‘RSM not processing response’. The log includes attributes like the error status and the replica set name. When in the shutdown state, the RSM stops processing responses, which is reflected in the logs.
In a standard replication configuration, the ReplicaSetMonitor (RSM) is expected to operate continuously, tracking the status of nodes in a replica set.
On the administrative layer, we have no control over the process, it starts and ends automatically; for example, during the database shutdown, you must see that message because it’s the database process is performing the cleanup before closing the process. However, MongoDB still controls all the processes internally.
Another condition that can lead the process to shut down is ReplicaSet Configuration change; That’s because If there are significant changes in the replica set or connection parameters, the RSM might be shut down to apply these updates.
To try to understand better, you can track down the sequence of events from ReplicaSetMonitor from the logs. But other than the node shutdown, the log itself is not very verbose in telling why the RSM task entered the shutdown state.
It’s important to mention that although the message itself is not displayed as an ERROR or WARNING, having the RSM in a shutdown state during normal operation is not good for the cluster itself, mostly due to its objective, which is to monitor the state of the replica set.
With the RSM being down, some problems may arise:
As said before, we have no control over the thread itself; if you are getting flooded with those messages, the best action you can take is to restart the database process.
|
1 |
sudo systemctl restart mongodb |
Although a bit rough, a clean stop-and-start is the only current way to make processes back and healthy again.
In this article, we explored the ‘RSM not processing response’ message in MongoDB, a critical indicator of the ReplicaSetMonitor’s shutdown state. Understanding this message helps in maintaining the health and performance of MongoDB’s replication system, with a restart of the database process as a potential solution for persistent issues.
Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.
Resources
RELATED POSTS