Table of Contents
This chapter describes how a JMS client can recover from a JEUS MQ server or network failure and re-establish the connection.
It also explains the server configuration and server failure recovery that are required for JMS client recovery.
In JEUS MQ, when failure occurs a client automatically reconnects to the client application and restores the connection to a point before the failure by using the failover functionality.
Reasons for failure can be classified into the following two categories.
Network Failure
If a network failure occurs, a JEUS MQ server can no longer communicate with a client. The network could be down temporarily or the server could be down or the network may be completely unavailable.
If a network failure occurs, a JEUS MQ client attempts to reconnect to the failed server, or to its backup server. If the attempt succeeds, the client state is recovered automatically, and services become available again.
Server Failure
Server failure includes all types of failures except network failures.
In general, server failures occur due to disk or database failures, or lack of memory. When a server fails, the standby backup server automatically restores data, and continues to provide the service.
To handle such failures, the network between JEUS MQ servers and required JEUS MQ client settings must be configured. Failover properties can be configured for each client by calling the client API provided by JEUS MQ.
This section describes the network configuration and other required JEUS MQ failover settings.
For JEUS MQ failover, one standby server is required per active server.
It requires one or more active servers, and they must be clustered.
A standby server is optional and is used to provide a backup system during a failure.
For more information about JEUS clustering configuration, refer to JEUS Domain Guide. "Chapter 5. JEUS Clustering".
Active Server
The main server that processes client requests during normal operation.
Standby Server
The backup server that continues to provide the services of the active server when it fails.
Starting from JEUS v7.0 Fix#1, JEUS MQ clustering and JEUS MQ failover functions are integrated. Therefore, configuring a JEUS MQ cluster also enables JEUS MQ failover.
Unlike the previous versions, an active and standby servers are not configured as a pair. This allows servers to be configured more flexibly. A general configuration usually includes multiple active servers and fewer standby servers, or only active servers with no standby servers.
To enable failover, the network between MQ servers is configured as in the following figure.
When an active server fails, one of the available standby servers takes over operations of the active server. If another failure occurs on any of the active or standby servers, another available standby server takes over the failed server's operations. If no standby server is available, one of the available active servers takes over operations thereby playing the role of multiple servers. Services are provided in this way until the last available server remains. When the last available server fails, the JEUS MQ cluster can no longer provide its services.
The following example shows how to configure the failover function of the active and standby servers. Go to [Servers] > [Server Name] > [Engine] > [Jms Engine] > [Basic] and the Jms failover configuration screen appears.
Active Server Configuration
To configure an active server, either check the Fail Over checkbox and select Active, or do not check the Fail Over checkbox at all.
Failover enables connection factories to redirect connection requests when the JEUS MQ server is unavailable.
To configure a connection factory in WebAdmin, go to [Servers] > [Server Name] > [Engine] > [Jms Engine] > [Connection Factory]. Then, select a connection factory to configure from the list.
Reconnect Enabled sets the option to attempt to reconnect if the connection between a client and server is disconnected. The default value is false. When Reconnect Enabled is checked, a JEUS MQ client will attempt to reconnect to the active server and standby server, alternating the attempts between the two servers.
When the DeliveryMode is set to PERSISTENT, messages are saved in a persistence store.
When a server fails, another active or standby server can recover the messages of the failed server from the persistence store to provide seamless services. The persistence store is a key resource of JEUS MQ failover function.
To configure a persistence store for JEUS MQ failover, the persistence store must be in a path that can be accessed by the active and standby servers.
To use the journal log as the persistence store, the base journal log directory (Base Dir of the journal log configuration) has to be in a directory that can be accessed by the active and standby servers. This requires a setup of a disk sharing hardware like SAN and the creation of a journal log base directory.
If a server cannot access the persistence store, failover will be attempted with another server that can access the persistence store.
When an active server fails and fails over to another active or standby server, the server administrator must quickly identify possible reasons for failure and recover and restore the failed server.
When the active server restarts, the data is migrated from the backup server to the active server and the connected clients are also reconnected to the restarted server. Such process is called failback. Failback is always performed automatically .
If a JEUS MQ client is disconnected from the server due to a server or network failure, the client attempts to reconnect to an active server and a standby server, alternating the attempts between the two servers. If successfully reconnected, the client attempts to restore the server to the state where it was before being disconnected. Such client failover process is automatically performed through JEUS MQ configurations without having to change client application source codes.
This section describes the details and restrictions of client failover process and explains how to handle a failure without message loss.
The "Reconnect Enabled" option determines whether to try to reconnect if the connection between a client and server is disconnected. This applies to all connections that are created through the connection factory. For more information, refer to "5.2.2. Configuring Connection Factories".
To modify the reconnection configuration of a particular connection, use the "jeus.jms.client.facility.connection.JeusConnection" class, which is the JEUS MQ client API.
. . . import jeus.jms.client.facility.connection.JeusConnection; . . . Context ctx = new InitialContext(); ConnectionFactory factory = ctx.lookup("connection-factory"); JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus"); connection.setReconnectEnabled(true); connection.setReconnectInterval(1000); // 1seconds connection.setReconnectPeriod(3600000); // 1hour . . .
When Reconnect Enabled is set to true, the entire reconnection process is automatically performed on the client application without modifying the client source code.
In JEUS MQ, active and standby servers use the same connection factories. Once a connection factory is obtained through JNDI Lookup, it can be reused without having to look it up again when a server or network failure occurs.
Like connection factories, active and standby servers use the same destinations. Once a destination is obtained through JNDI Lookup, it can be reused without having to look it up again when a server or network failure occurs.
Since the messages that have been stored at the destination are all automatically restored, the client can continue to process the messages through the destination.
All requests sent from JEUS MQ clients wait for a response from the server for a specific amount of time. (Default Value: 200000, Unit : ms)
This is configured in the Request Blocking Time option in the Connection Factory page.
Connection-specific settings can be configured by using the JEUS MQ Client API "jeus.jms.client.facility.connection.JeusConnection" class.
. . . import jeus.jms.client.facility.connection.JeusConnection; . . . Context ctx = new InitialContext(); ConnectionFactory factory = ctx.lookup("connection-factory"); JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus"); connection.setRequestBlockingTime(300000); // 5 minutes . . .
RequestBlockingTime is also used as the default transaction timeout value for session or XA transaction.
When connection recovery is not configured, JEUS MQ connections share physical connections (sockets) by default. If the <reconnect-enabled> element of the connection factory configuration in domain.xml is set to true, in order to enable failover each connection has 1:1 relationship with a physical connection.
When physical and logical connections establish a 1:1 relationship, a new physical connection has to be created whenever a new connection is created. Since this may result in performance degradation, the client application must be implemented to reuse connections without having to create a new one each time.
When a connection recovers, the connection state is also recovered.
Start state
If Connection.start() has been called to receive messages, then it continues to receive messages after the connection recovers.
Stop state
If Connection.start() has been called to stop receiving messages, then it does not receive messages after the connection recovers.
Other objects created by using the connection object including sessions and connection consumers are all restored.
Sessions that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. For more information, refer to "5.3.6. Session Recovery".
Connection consumers that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. If the connection was in the start state before the connection recovers, then the consumer will start receiving messages again after the recovery. Since the messages that were received before the failure are all returned to the server and retrieved again, the Message.getJMSRedelivered() method call for these messages may return "true".
After recovering from the failure, the methods that create sessions or connection message receivers will re-send its requests and wait for a response. If Connection.close is invoked, recovery is not performed regardless of whether or not a response is issued.
Sessions are automatically restored during the connection recovery process unless Session.close() is called. In addition, other objects derived from the connection object including MessageConsumers or MessageProducers are all restored.
A session implements methods for creating various objects. The following shows how each method is used after recovery.
Message Consumer Creation Method
Message consumer creation methods complete a request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createConsumer(Destination destination) createConsumer(Destination destination, java.lang.String messageSelector) createConsumer(Destination destination, java.lang.String messageSelector,boolean NoLocal)
Durable Message Subscriber Creation Method
Durable message subscriber creation methods complete the request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createDurableSubscriber(Topic topic, String name) createDurableSubscriber(Topic topic,String name, String messageSelector,boolean noLocal)
If an error occurs in the session, the session transaction is affected in the following cases.
commit()
If an error occurs while sending and receiving a message through the message producer and consumer that are created by the transaction session, then the "javax.jms.TransactionRolledBackException" is generated at the first commit point and the transaction is rolled back. If there are no messages to commit at the commit point, the exception does not occur. Even after the failure of a commit operation, the subsequent commit operations for the transaction are executed normally.
If a failure that occurred during the commit operation is not recovered during the RequestBlockingTime, a JMSException is raised. In this case, the commit operation must be checked by using the administration tool.
rollback()
Rollback() completes a rollback request after the failure is recovered. If a failure that occurred during the rollback operation is not recovered during the RequestBlockingTime, a JMSException is raised. Even after a JMSException, the rollback operation is performed normally.
The Session.recover() method completes the recovery request even after the failure has been recovered. When an error occurs after a Session.recover() is issued and the failure is not recovered when the RequestBlockingTime expires, a JMSException is raised.
When the acknowledge mode of the session is configured to Session.CLIENT_ACKNOWLEDGE, Message.acknowledge() will be issued for the unacknowledged messages that exist in the session. If an error occurs during the acknowledgement, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for each message. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().
This section describes how to handle the errors that occur while sending messages through message producers.
The send() method of the message producer is blocked until the message is sent to the server and a response is returned. The following describes possible error scenarios for this process.
The send() method is called, but the message has not yet been sent.
After recovery, the message is sent to the server and processed successfully. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
The send() method is called, and the message was processed on the server. However, a network error occurs.
If the server is reconnected after recovery, a response message is successfully issued. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
The send method is called, and the message was processed on the server. However, a server error occurs, and then the server recovers.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
The send method is called, and the message has not yet been processed on the server. However, a network or server error occurs.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
The MessageID of the failed message can be obtained by calling MessageSendException.getErrorCode().
JEUS supports synchronous and asynchronous message reception methods. Recovery is performed differently for each method. When messages are received synchronously, the following methods are used for recovery.
A message consumer implements three methods for synchronously receiving messages, MessageConsumer.receive(), MessageConsumer.receive(long timeout), and MessageConsumer.receiveNoWait().
The following describes what happens when an error occurs during each method call.
receive()
This method normally blocks until a message is received. When an error occurs, the wait time is changed to the RequestBlockingTime to prevent an infinite wait. If the failure is recovered before the time expires, the request message is re-delivered. Otherwise, a JMSException is raised.
If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the received message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.
receive(long timeout)
This method normally blocks until a message is received. When an error occurs, the timeout value is changed to the RequestBlockingTime if it is greater than the RequestBlockingTime to prevent an infinite wait. If the failure is recovered before the time expires, the request message is re-delivered. Otherwise, a JMSException is raised.
If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.
receiveNoWait()
This method does not block even if it does not receive the message. It immediately returns even if an error occurs. .
Asynchronously received messages are either being processed by MessageListener.onMessage, being acknowledged after they have been processed by MessageListener.onMessage, or those prefetched and queued in the client queue.
JEUS MQ performs failover for each of these messages.
If an error occurs during onMessage, an acknowledgement is sent after failure is recovered. The message is normally processed.
If an error occurs while sending an acknowledgement after onMessage is processed, the ExceptionListener will issue a jeus.jms.common.message.MessageAcknowledgeException for the message that has not been acknowledged. The exception notifies that a failure occurred during message acknowledgement, and the message may be re-delivered.
The messages queued on the client queue through prefetching are sent to the server after failure is recovered, and later sent to the client. The Message.getJMSRedelivered() method call for these messages may return "true".
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().
JEUS MQ failover is automatically and transparently processed in a client application. However, when the messages become lost during the message transmission process, they must be processed separately by the ExceptionListener.
Message loss in an enterprise messaging application can be critical. The only way to perfectly recover from a failure while preventing message loss is to use transactions.
It is strongly recommended to use the following method to create an application.
In the J2EE environment, messages have to be sent and received inside a transaction.
For Servlets, the UserTransaction must be looked up in the JNDI object, and messages must be sent and received inside the UserTransaction.
For EJBs, the TransactionAttribute of the EJB method must be set to "Required" or "RequireNew" so that the messages can be sent and received inside a transaction.
General Java clients call the Connection.createSession(true, Session.SESSION_TRANSACTED) method to create a session. Such sessions can send and receive messages inside a transaction by calling commit() or rollback().