Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MongoDB Log and the Message "RSM Not Processing Response"

April 3, 2024

Author

Jean da Silva

Insight for DBAs

MongoDB

Share this Post:

One of the most common tasks for database administrators is checking logs; you can either work directly with the file or process it using another tool. Either way, regularly checking the logs remains essential.

Within this context, certain log messages occasionally begin to appear, yet, unfortunately, there isn’t much literature on some of them; that’s because neither the community nor the documentation offers much further explanation.

This article aims to explain the message of ‘RSM not processing response’ more deeply and provide a more solid foundation for understanding it.

We will break this article into three sections:

- What’s the message?

- Why does that happen?

- How to fix it.

First things first, let’s take a look at a typical instance of the error we’re discussing:

{"t":{"$date":"2024-03-05T08:33:41.685-03:00"},"s":"I", "c":"-", "id":4495400, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM not processing response","attr":{"error":{"code":0,"codeName":"OK"},"replicaSet":"rs0"}}

1	{"t":{"$date":"2024-03-05T08:33:41.685-03:00"},"s":"I", "c":"-", "id":4495400, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM not processing response","attr":{"error":{"code":0,"codeName":"OK"},"replicaSet":"rs0"}}

Before jumping into those sections, let’s break down that message and explain what mongo.log is trying to communicate to us.

- Timestamp (t): This is the time at which the log entry was recorded. The format follows the ISO 8601 standard, including the date, time, and time zone offset (-03:00).

- Severity (s): “I” This represents the severity level of the log entry. “I” stands for “Informational,” indicating that this entry is informational and not an error or warning.

- Component (c): “-“ This field typically denotes the database component that generated the log message. A dash (“-“) suggests that this log entry is not associated with a specific database component.

- Identifier (id): 4495400 This is a unique identifier for the log entry. It can be used to reference this specific message in documentation or when seeking support.

- Context (ctx): “ReplicaSetMonitor-TaskExecutor” This indicates the name of the thread that caused the log statement.; in this case, the task executor for the Replica Set Monitor.

- Message (msg): “RSM not processing response” This is the actual log message.

- Attributes (attr): error: Contains details about any error that occurred.

What’s the message?

As we’ve seen in the breakdown of the log, this message is associated with the ReplicaSetMonitor-TaskExecutor thread. The ReplicaSetMonitor(RSM) is a critical component in MongoDB, tasked with tracking the status and configuration of the replica set. When the log says ‘RSM not processing response,’ it suggests an issue in how RSM is handling or reacting to responses received during its monitoring activities; from the documentation, we have more tails on the goal behind the ReplicaSetMonitor(RSM):

- Nodes in a topology are discovered and monitored through replica set monitoring.

- Replica set monitoring entails periodically refreshing the local view of topologies for which the client needs to perform targeting.

- The client has a ReplicaSetMonitor for each replica set it needs to target in the cluster.

- So, if a mongos needs to target 2 shards for a query, it either has or creates a ReplicaSetMonitor for each of the corresponding shards.

This monitor/discovery is made via isMaster/hello operation.

What is happening here is that your node got a response from that isMaster/hello, but it’s in a shutdown state. Thus, it can’t process the response.

It’s also important to mention that MongoDB uses two methods for this tracking process: the ‘sdam’ (Server Discovery and Monitoring) and the ‘streamable’ method. ‘SDAM’ is about MongoDB figuring out the layout and status of its databases, like who’s in charge and who’s backing up. The ‘streamable’ method is preferred for its efficiency in keeping an eye on changes in the database setup. It will learn much sooner about stepdowns, elections, reconfigs, and other events. This method smartly maintains up-to-date connections with each database in the set without overloading the system, enhancing the effectiveness of the RSM.

https://github.com/mongodb/mongo/blob/master/src/mongo/client/server_discovery_monitor.cpp#L253C1-L288C18

StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleStreamableHello() {
    auto maxAwaitTimeMS = durationCount<Milliseconds>(kMaxAwaitTime);
    overrideMaxAwaitTimeMS.execute([&](const BSONObj& data) {
        maxAwaitTimeMS =
            durationCount<Milliseconds>(Milliseconds(data["maxAwaitTimeMS"].numberInt()));
    });

    BSONObjBuilder bob;
    bob.append("hello", 1);
    bob.append("maxAwaitTimeMS", maxAwaitTimeMS);
    bob.append("topologyVersion", _topologyVersion->toBSON());

    WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);

    const auto timeoutMS = _connectTimeout + kMaxAwaitTime;
    auto request = executor::RemoteCommandRequest(
        HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, timeoutMS);
    request.sslMode = _setUri.getSSLMode();

    auto swCbHandle = _executor->scheduleExhaustRemoteCommand(
        request,
        [self = shared_from_this(), helloStats = _stats->collectHelloStats()](
            const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {
            Milliseconds nextRefreshPeriod;
            {
                stdx::lock_guard lk(self->_mutex);

                if (self->_isShutdown) {
                    self->_helloOutstanding = false;
                    LOGV2_DEBUG(4495400,
                                kLogLevel,
                                "RSM not processing response",
                                "error"_attr = result.response.status,
                                "replicaSet"_attr = self->_setUri.getSetName());
                    return;
                }

StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleStreamableHello() {

auto maxAwaitTimeMS = durationCount<Milliseconds>(kMaxAwaitTime);

overrideMaxAwaitTimeMS.execute([&](const BSONObj& data) {

maxAwaitTimeMS =

durationCount<Milliseconds>(Milliseconds(data["maxAwaitTimeMS"].numberInt()));

});

BSONObjBuilder bob;

bob.append("hello", 1);

bob.append("maxAwaitTimeMS", maxAwaitTimeMS);

bob.append("topologyVersion", _topologyVersion->toBSON());

WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);

const auto timeoutMS = _connectTimeout + kMaxAwaitTime;

auto request = executor::RemoteCommandRequest(

HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, timeoutMS);

request.sslMode = _setUri.getSSLMode();

auto swCbHandle = _executor->scheduleExhaustRemoteCommand(

request,

[self = shared_from_this(), helloStats = _stats->collectHelloStats()](

const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {

Milliseconds nextRefreshPeriod;

{

stdx::lock_guard lk(self->_mutex);

if (self->_isShutdown) {

self->_helloOutstanding = false;

LOGV2_DEBUG(4495400,

kLogLevel,

"RSM not processing response",

"error"_attr = result.response.status,

"replicaSet"_attr = self->_setUri.getSetName());

return;

}

Also:

https://github.com/mongodb/mongo/blob/master/src/mongo/client/server_discovery_monitor.cpp#L316C1-L342C18

StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleSingleHello() {
BSONObjBuilder bob;
bob.append("hello", 1);

WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);

auto request = executor::RemoteCommandRequest(
HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, _connectTimeout);
request.sslMode = _setUri.getSSLMode();

auto swCbHandle = _executor->scheduleRemoteCommand(
request,
[self = shared_from_this(), helloStats = _stats->collectHelloStats()](
const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {
Milliseconds nextRefreshPeriod;
{
stdx::lock_guard lk(self->_mutex);
self->_helloOutstanding = false;

if (self->_isShutdown) {
LOGV2_DEBUG(4333219,
kLogLevel,
"RSM not processing response",
"error"_attr = result.response.status,
"replicaSet"_attr = self->_setUri.getSetName());
return;
}

StatusWith<TaskExecutor::CallbackHandle> SingleServerDiscoveryMonitor::_scheduleSingleHello() {

BSONObjBuilder bob;

bob.append("hello", 1);

WireSpec::getWireSpec(getGlobalServiceContext()).appendInternalClientWireVersionIfNeeded(&bob);

auto request = executor::RemoteCommandRequest(

HostAndPort(_host), DatabaseName::kAdmin, bob.obj(), nullptr, _connectTimeout);

request.sslMode = _setUri.getSSLMode();

auto swCbHandle = _executor->scheduleRemoteCommand(

request,

[self = shared_from_this(), helloStats = _stats->collectHelloStats()](

const executor::TaskExecutor::RemoteCommandCallbackArgs& result) mutable {

Milliseconds nextRefreshPeriod;

{

stdx::lock_guard lk(self->_mutex);

self->_helloOutstanding = false;

if (self->_isShutdown) {

LOGV2_DEBUG(4333219,

kLogLevel,

"RSM not processing response",

"error"_attr = result.response.status,

"replicaSet"_attr = self->_setUri.getSetName());

return;

}

On both functions responsible for ReplicaSet Monitor:

- SingleServerDiscoveryMonitor::_scheduleStreamableHello()

- SingleServerDiscoveryMonitor::_scheduleSingleHello()

We have the if (self->_isShutdown) condition. When this condition is true, indicating that the ReplicaSetMonitor(RSM) is in a shutdown state, it triggers a log entry with the message ‘RSM not processing response’. The log includes attributes like the error status and the replica set name. When in the shutdown state, the RSM stops processing responses, which is reflected in the logs.

Why does that happen?

In a standard replication configuration, the ReplicaSetMonitor (RSM) is expected to operate continuously, tracking the status of nodes in a replica set.

On the administrative layer, we have no control over the process, it starts and ends automatically; for example, during the database shutdown, you must see that message because it’s the database process is performing the cleanup before closing the process. However, MongoDB still controls all the processes internally.

Another condition that can lead the process to shut down is ReplicaSet Configuration change; That’s because If there are significant changes in the replica set or connection parameters, the RSM might be shut down to apply these updates.

To try to understand better, you can track down the sequence of events from ReplicaSetMonitor from the logs. But other than the node shutdown, the log itself is not very verbose in telling why the RSM task entered the shutdown state.

How to fix it

It’s important to mention that although the message itself is not displayed as an ERROR or WARNING, having the RSM in a shutdown state during normal operation is not good for the cluster itself, mostly due to its objective, which is to monitor the state of the replica set.

With the RSM being down, some problems may arise:

- Delayed or incorrect topology awareness: The primary function of the RSM is to track the status and configuration of the nodes in a replica set. If the RSM is down, the client may not be aware of changes in the topology, like which node is the primary or the state of secondary nodes.

- Impact on read and write operations: MongoDB clients use the RSM to direct read and write operations according to the replica set’s current state and the specified read/write preferences. Without an operational RSM, these operations might fail or be less efficient, leading to potential performance degradation or inconsistency in data access.

- Ineffective failover handling: In the event of a primary node failure, the RSM helps in identifying a new primary node. If the RSM is not functioning, failover processes may be delayed or not happen correctly, affecting the availability and resilience of the database.

- Compromised high availability and fault tolerance: One of the key advantages of using a replica set is high availability and fault tolerance. With the RSM down, these features are compromised, as the system may not respond appropriately to node failures or network partitions.

As said before, we have no control over the thread itself; if you are getting flooded with those messages, the best action you can take is to restart the database process.

sudo systemctl restart mongodb

1	sudo systemctl restart mongodb

Although a bit rough, a clean stop-and-start is the only current way to make processes back and healthy again.

Maintain the health and performance of MongoDB’s replication system

In this article, we explored the ‘RSM not processing response’ message in MongoDB, a critical indicator of the ReplicaSetMonitor’s shutdown state. Understanding this message helps in maintaining the health and performance of MongoDB’s replication system, with a restart of the database process as a potential solution for persistent issues.

Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.

Download Percona Distribution for MongoDB Today!

0 0 votes

Article Rating

1 Comment

Oldest

Newest Most Voted

Editor

Igor Solodovnikov

2 years ago

Great explanation!
BTW sometimes it is very handy to know that 7-digit message identifier refers to the ticket in MongoDB’s jira which introduced this message into the codebase. For example in this case 4495400 refers to SERVER-44954. Moreover, from the comments on that ticket you can quickly lookup the link to the commit on github: https://github.com/mongodb/mongo/commit/e5b78551eb666c3ae61691acfe8e9ea1c6b33651