Chapter 5. Replicating Node Manager

Table of Contents

5.1. Overview
5.2. Operating a Replicated Node Manager
5.3. Using a Replicated Node Manager
5.4. Terminating a Replicated Node Manager

This chapter describes how to replicate a node manager to recover the node manager when it was terminated abnormally.

5.1. Overview

You can use a node manager to operate and terminate a server and restart a server when it fails. In case the node manager also fails, it can be replicated, and a replicated node manager can operate instead of a node manager with a failure.

5.2. Operating a Replicated Node Manager

A replicated node manager is a pair of an active and standby node managers. The active node manager manages and processes requests of DAS or servers.

[Figure 5.1] Operation of a Replicated Node Manager

Operation of a Replicated Node Manager

A standby node manager monitors the status of the active node manager. If the standby node manager detects a failure in the active node manager, it determines that the active node manager is terminated abnormally and replaces the active manager. To monitor the standby node manager that has become active, another standby node manager is executed. The detailed procedure is as follows:

  1. Active and standby node managers start at the same time.

  2. The standby node manager monitors status of the active node manager.

  3. The standby node manager may detect a failure in the active node manager.

  4. The standby node manager starts another standby node manager.

  5. The existing standby node manager becomes an active node manager and processes requests from a server.

The status of a node manager is checked through a port set in a configuration file. If the port is not set, the server regards that node manager replication is not used.

Note

Whenever a standby node manager becomes an active node manager due to a failure in an active node manager, a standby node manager is executed additionally and monitors the previous standby node manager.

5.3. Using a Replicated Node Manager

To replicate a node manager, you need to configure a port used to send and receive messages between an active node and a standby node. This port can be configured with the standbyPort item as described in "2.3.1. Configuration File". If you do not want to replicate a node manager, do not configure this port.

Standby Node Manager

A standby node manager starts when an active node manager starts, and monitors the active node manager's status in standby mode.

[nodemanager-1] [NodeManager-0201] The standby node manager is starting.
[nodemanager-1] [NodeManager-0102] Initializing the node manager configuration.

A standby node manager does not process requests of a server, and it saves the status information to a log file. This log file is created where a node manager log file is located with the name of a node manager name followed by the '_standby' string. This log file is used only by standby node managers.

When a standby node manager becomes an active node manager, this information is recorded in a log file for standby node managers. Then, the new active node manager uses the log file for active node managers, and a new standby node manager uses the log file for standby node managers. The log file for active node managers records information related to server requests and management, and the log file for stand by node managers records the start of standby node managers and their communication with an active node manager.

A standby node manager gets and records the PID of an active node manager as a log at the first instance of connection. An active process can be found with the PID.

5.4. Terminating a Replicated Node Manager

A replicated node manager operates without downtime because a standby node manager replaces an active node manager that was terminated abnormally. Therefore, if you want to terminate a replicated node manager, you need to use the stopNodeManager script. This script makes an active node manager send a termination message to a standby node manager. The standby node manager ends a connection, logs its own termination, and then terminates safely.

Note

If a node manager is terminated forcibly due to an issue, its standby node manager may not be terminated. Therefore, the node manager must be terminated after terminating the standby node manager. It is recommended to use the stopNodeManager script because forcible termination may record log incorrectly.