Chapter 5. JEUS MQ Failover

Table of Contents

5.1. Overview
5.2. Server Failover
5.2.1. Network Configuration
5.2.2. Configuring Connection Factories
5.2.3. Configuring Persistence Stores
5.2.4. Automatic Failback
5.3. Client Failover
5.3.1. Reconnection
5.3.2. Reusing Connection Factories
5.3.3. Reusing Destinations
5.3.4. Request Blocking Time
5.3.5. Connection Recovery
5.3.6. Session Recovery
5.3.7. Transmission Error Message Recovery
5.3.8. Reception Error Message Recovery
5.3.9. Message Loss Prevention and Transactions

This chapter describes how a JMS client can recover from a JEUS MQ server or network failure and re-establish the connection.

It also explains the server configuration and server failure recovery that are required for JMS client recovery.

In JEUS MQ, when failure occurs a client automatically reconnects to the client application and restores the connection to a point before the failure by using the failover functionality.

Reasons for failure can be classified into the following two categories.

  • Network Failure

    If a network failure occurs, a JEUS MQ server can no longer communicate with a client. The network could be down temporarily or the server could be down or the network may be completely unavailable.

    If a network failure occurs, a JEUS MQ client attempts to reconnect to the failed server, or to its backup server. If the attempt succeeds, the client state is recovered automatically, and services become available again.

  • Server Failure

    Server failure includes all types of failures except network failures.

    In general, server failures occur due to disk or database failures, or lack of memory. When a server fails, the standby backup server automatically restores data, and continues to provide the service.

To handle such failures, the network between JEUS MQ servers and required JEUS MQ client settings must be configured. Failover properties can be configured for each client by calling the client API provided by JEUS MQ.

This section describes the network configuration and other required JEUS MQ failover settings.

For JEUS MQ failover, one standby server is required per active server.

It requires one or more active servers, and they must be clustered.

A standby server is optional and is used to provide a backup system during a failure.

For more information about JEUS clustering configuration, refer to JEUS Domain Guide. "Chapter 5. JEUS Clustering".

  • Active Server

    The main server that processes client requests during normal operation.

  • Standby Server

    The backup server that continues to provide the services of the active server when it fails.

Note

Starting from JEUS v7.0 Fix#1, JEUS MQ clustering and JEUS MQ failover functions are integrated. Therefore, configuring a JEUS MQ cluster also enables JEUS MQ failover.

Unlike the previous versions, an active and standby servers are not configured as a pair. This allows servers to be configured more flexibly. A general configuration usually includes multiple active servers and fewer standby servers, or only active servers with no standby servers.

To enable failover, the network between MQ servers is configured as in the following figure.


When an active server fails, one of the available standby servers takes over operations of the active server. If another failure occurs on any of the active or standby servers, another available standby server takes over the failed server's operations. If no standby server is available, one of the available active servers takes over operations thereby playing the role of multiple servers. Services are provided in this way until the last available server remains. When the last available server fails, the JEUS MQ cluster can no longer provide its services.

Active and Standby Server Configuration

The following example shows how to configure the failover function of the active and standby servers. Go to [Servers] > [Server Name] > [Engine] > [Jms Engine] > [Basic] and the Jms failover configuration screen appears.


If a JEUS MQ client is disconnected from the server due to a server or network failure, the client attempts to reconnect to an active server and a standby server, alternating the attempts between the two servers. If successfully reconnected, the client attempts to restore the server to the state where it was before being disconnected. Such client failover process is automatically performed through JEUS MQ configurations without having to change client application source codes.

This section describes the details and restrictions of client failover process and explains how to handle a failure without message loss.

The "Reconnect Enabled" option determines whether to try to reconnect if the connection between a client and server is disconnected. This applies to all connections that are created through the connection factory. For more information, refer to "5.2.2. Configuring Connection Factories".

To modify the reconnection configuration of a particular connection, use the "jeus.jms.client.facility.connection.JeusConnection" class, which is the JEUS MQ client API.

. . .
import jeus.jms.client.facility.connection.JeusConnection;
. . .
Context ctx = new InitialContext();
ConnectionFactory factory = ctx.lookup("connection-factory");
JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus");
connection.setReconnectEnabled(true);
connection.setReconnectInterval(1000); // 1seconds
connection.setReconnectPeriod(3600000); // 1hour
. . .

When Reconnect Enabled is set to true, the entire reconnection process is automatically performed on the client application without modifying the client source code.

When connection recovery is not configured, JEUS MQ connections share physical connections (sockets) by default. If the <reconnect-enabled> element of the connection factory configuration in domain.xml is set to true, in order to enable failover each connection has 1:1 relationship with a physical connection.

Note

When physical and logical connections establish a 1:1 relationship, a new physical connection has to be created whenever a new connection is created. Since this may result in performance degradation, the client application must be implemented to reuse connections without having to create a new one each time.

When a connection recovers, the connection state is also recovered.

  • Start state

    If Connection.start() has been called to receive messages, then it continues to receive messages after the connection recovers.

  • Stop state

    If Connection.start() has been called to stop receiving messages, then it does not receive messages after the connection recovers.

Other objects created by using the connection object including sessions and connection consumers are all restored.

After recovering from the failure, the methods that create sessions or connection message receivers will re-send its requests and wait for a response. If Connection.close is invoked, recovery is not performed regardless of whether or not a response is issued.

Sessions are automatically restored during the connection recovery process unless Session.close() is called. In addition, other objects derived from the connection object including MessageConsumers or MessageProducers are all restored.

A session implements methods for creating various objects. The following shows how each method is used after recovery.

If an error occurs in the session, the session transaction is affected in the following cases.

  • commit()

    If an error occurs while sending and receiving a message through the message producer and consumer that are created by the transaction session, then the "javax.jms.TransactionRolledBackException" is generated at the first commit point and the transaction is rolled back. If there are no messages to commit at the commit point, the exception does not occur. Even after the failure of a commit operation, the subsequent commit operations for the transaction are executed normally.

    If a failure that occurred during the commit operation is not recovered during the RequestBlockingTime, a JMSException is raised. In this case, the commit operation must be checked by using the administration tool.

  • rollback()

    Rollback() completes a rollback request after the failure is recovered. If a failure that occurred during the rollback operation is not recovered during the RequestBlockingTime, a JMSException is raised. Even after a JMSException, the rollback operation is performed normally.

The Session.recover() method completes the recovery request even after the failure has been recovered. When an error occurs after a Session.recover() is issued and the failure is not recovered when the RequestBlockingTime expires, a JMSException is raised.

When the acknowledge mode of the session is configured to Session.CLIENT_ACKNOWLEDGE, Message.acknowledge() will be issued for the unacknowledged messages that exist in the session. If an error occurs during the acknowledgement, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for each message. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.

Note

The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().

This section describes how to handle the errors that occur while sending messages through message producers.

The send() method of the message producer is blocked until the message is sent to the server and a response is returned. The following describes possible error scenarios for this process.

  • The send() method is called, but the message has not yet been sent.

    After recovery, the message is sent to the server and processed successfully. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.

  • The send() method is called, and the message was processed on the server. However, a network error occurs.

    If the server is reconnected after recovery, a response message is successfully issued. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.

  • The send method is called, and the message was processed on the server. However, a server error occurs, and then the server recovers.

    Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.

  • The send method is called, and the message has not yet been processed on the server. However, a network or server error occurs.

    Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.

    Note

    The MessageID of the failed message can be obtained by calling MessageSendException.getErrorCode().

JEUS supports synchronous and asynchronous message reception methods. Recovery is performed differently for each method. When messages are received synchronously, the following methods are used for recovery.

Recovery of Synchronously Received Messages

A message consumer implements three methods for synchronously receiving messages, MessageConsumer.receive(), MessageConsumer.receive(long timeout), and MessageConsumer.receiveNoWait().

The following describes what happens when an error occurs during each method call.

  • receive()

    This method normally blocks until a message is received. When an error occurs, the wait time is changed to the RequestBlockingTime to prevent an infinite wait. If the failure is recovered before the time expires, the request message is re-delivered. Otherwise, a JMSException is raised.

    If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the received message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.

  • receive(long timeout)

    This method normally blocks until a message is received. When an error occurs, the timeout value is changed to the RequestBlockingTime if it is greater than the RequestBlockingTime to prevent an infinite wait. If the failure is recovered before the time expires, the request message is re-delivered. Otherwise, a JMSException is raised.

    If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.

  • receiveNoWait()

    This method does not block even if it does not receive the message. It immediately returns even if an error occurs. .

Recovery of Asynchronously Received Messages

Asynchronously received messages are either being processed by MessageListener.onMessage, being acknowledged after they have been processed by MessageListener.onMessage, or those prefetched and queued in the client queue.

JEUS MQ performs failover for each of these messages.

  • If an error occurs during onMessage, an acknowledgement is sent after failure is recovered. The message is normally processed.

  • If an error occurs while sending an acknowledgement after onMessage is processed, the ExceptionListener will issue a jeus.jms.common.message.MessageAcknowledgeException for the message that has not been acknowledged. The exception notifies that a failure occurred during message acknowledgement, and the message may be re-delivered.

  • The messages queued on the client queue through prefetching are sent to the server after failure is recovered, and later sent to the client. The Message.getJMSRedelivered() method call for these messages may return "true".

Note

The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().