Table of Contents
This chapter describes how a JMS client can recover from a JEUS MQ server or network failure and re-establish the connection. It also explains the server configuration and server failure recovery that are required for JMS client recovery.
In JEUS MQ, when failure occurs a client automatically reconnects to the client application and restores the connection to a point before the failure by using the failover functionality.
Reasons for failure can be classified into the following two categories.
Network Failure
If a network failure occurs, a JEUS MQ server can no longer communicate with a client. It maybe that the network is down temporarily or completely unavailable, or the server is down.
If a network failure occurs, a JEUS MQ client attempts to reconnect to the failed server or to its backup server. If the attempt succeeds, the client state is recovered, and services become available again.
Server Failure
Server failure includes all types of failures except network failures.
In general, server failures occur due to disk or database failures or a lack of memory. When a server fails, the standby backup server automatically restores data, and continues to provide the service.
To handle such failures, the network between JEUS MQ servers and required JEUS MQ client settings must be configured. Failover properties can be configured for each client by calling the client API provided by JEUS MQ.
This section describes the network configuration and other required JEUS MQ failover settings.
To use JEUS MQ failover, one or more active servers must be clustered.
A standby server provides a backup support during a failure and it is optional.
For more information about JEUS clustering configuration, refer to JEUS Domain Guide. "Chapter 5. JEUS Clustering".
Active Server
The main server that processes client requests during normal operation.
Standby Server
The backup server that provides the services of the active server when it fails.
JEUS MQ clustering and JEUS MQ failover functions are integrated. Therefore, configuring a JEUS MQ cluster also enables JEUS MQ failover. Unlike the previous versions, active and standby servers are not configured as a pair. This allows servers to be configured more flexibly. A general configuration usually consists of many active servers and some standby servers, or active servers only.
To enable failover, the network between MQ servers is configured as in the following figure.
When an active server fails, one of the standby servers that are available takes over the operations of the failed active server. If another failure occurs on any of the active or standby servers, another standby server available takes over the failed server's operations. If no standby server is available, one of the active servers that are available takes over the operation, thereby providing services that two servers normally provide. This operation continues until only one server is available. When the last available server fails, the JEUS MQ failover service no longer works.
The following example shows how to configure the failover function of the active and standby servers. Go to [Servers] > [Server Name] > [Engine] > [JMS Engine] > [Basic] and the JMS failover configuration screen appears.
Active Server Configuration
To configure the failover function of the active server, for the Engine Roll item, select Active.
Failover enables connection factories to redirect connection requests when a JEUS MQ server is unavailable.
To configure a connection factory in WebAdmin, go to [Servers] > [Server Name] > [Engine] > [JMS Engine] > [Connection Factory]. Then, select a connection factory from the list.
Reconnect Enabled sets the option to attempt to reconnect if the connection between a client and server is disconnected. The default value is false. When Reconnect Enabled is checked, a JEUS MQ client will attempt to reconnect to the active server and standby server, alternating the attempts between the two servers.
When the DeliveryMode is set to PERSISTENT, messages are saved in a persistence store.
When a server fails, another active or standby server can retrieve the messages of the failed server from the persistence store to provide seamless services. The persistence store is a key resource of JEUS MQ failover function.
Before configuring a persistence store for JEUS MQ failover, the persistence store must be in a path that can be accessed by the active and standby servers.
To use the journal log as the persistence store, the base journal log directory (Base Dir of the journal log configuration) has to be under a directory that can be accessed by the active and standby servers. This requires a setup of a disk sharing hardware like SAN and the creation of a journal log base directory.
If a server cannot access the persistence store, failover will be attempted with another server that can access the persistence store.
When an active server fails over to another active or standby server, the server administrator must quickly identify possible reasons for failure and restore the failed server.
When the active server restarts, the data is migrated from the backup server to the active server and the connected clients are also reconnected to the restarted server. Such process is called failback. Failback is always performed automatically .
If a JEUS MQ client is disconnected from the server due to a server or network failure, the client attempts to reconnect to an active server and a standby server, alternating the attempts between the two servers. If successfully reconnected, the client attempts to restore the server to the state where it was before being disconnected. Such client failover process is automatically performed through JEUS MQ configurations without having to change client application source codes.
This section describes the details and restrictions of client failover process and explains how to handle a failure without message loss.
The "Reconnect Enabled" option determines whether to try to reconnect if the connection between a client and server is disconnected. This applies to all connections that are created through the connection factory. For more information, refer to "5.2.2. Configuring Connection Factories".
To modify the reconnection configuration of a particular connection, use the "jeus.jms.client.facility.connection.JeusConnection" class, which is the JEUS MQ client API.
. . . import jeus.jms.client.facility.connection.JeusConnection; . . . Context ctx = new InitialContext(); ConnectionFactory factory = ctx.lookup("connection-factory"); JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus"); connection.setReconnectEnabled(true); connection.setReconnectInterval(1000); // 1초 connection.setReconnectPeriod(3600000); // 1시간 . . .
When Reconnect Enabled is set to true, the entire reconnection process is automatically performed on the client application without modifying the client source code.
In JEUS MQ, active and standby servers use the same connection factory. Once a connection factory is obtained through a JNDI lookup, it can be reused without having to look it up again when a server or network failure occurs.
Like connection factories, active and standby servers share the same destination name. Once a destination is obtained through JNDI lookup, it can be reused without having to look it up again when a server or network failure occurs.
When a server fails, all the messages stored at the destination are restored, and the client can continue to process the messages by using the destination.
All requests sent from JEUS MQ clients wait for a response from the server for a specific amount of time. (Default Value: 200000, Unit: ms). This wait time is configured in the Request Blocking Time option in the Connection Factory page.
To configure settings for each connection, you can use the JEUS MQ Client API "jeus.jms.client.facility.connection.JeusConnection" class.
. . . import jeus.jms.client.facility.connection.JeusConnection; . . . Context ctx = new InitialContext(); ConnectionFactory factory = ctx.lookup("connection-factory"); JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus"); connection.setRequestBlockingTime(300000); // 5 minutes . . .
RequestBlockingTime is also used as the default transaction timeout value for session or XA transaction.
When connection recovery is not configured, JEUS MQ connection share a physical connection (a socket) by default. But if the <reconnect-enabled> element of the connection factory configuration in domain.xml is set to true, each client gets a one-to-one connection with the socket for failover.
When physical and logical connections establish a one-to-one relationship, a new physical connection has to be created whenever a new connection is created. Since this may result in performance degradation, the client application must be implemented to reuse connections without having to create a new one each time.
On a connection recovery, the connection state is also recovered.
Start state
If Connection.start() has been called to receive messages, then it continues to receive messages after the connection recovers.
Stop state
If Connection.start() has been called to stop receiving messages, then it does not receive messages after the connection recovers.
Other objects created by using the connection object including sessions and connection consumers are all restored.
Sessions that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. For more information, refer to "5.3.6. Session Recovery".
Connection consumers that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. If the connection was in the start state before the connection recovers, then the consumer will start receiving messages again after the recovery. Since the messages that were received before the failure are all returned to the server and retrieved again, the Message.getJMSRedelivered() method call for these messages may return "true".
After recovering from the failure, the methods that create sessions or connection message receivers will re-send its requests and wait for a response. If Connection.close is invoked, recovery is not performed regardless of whether or not a response is issued.
Sessions are automatically restored during the connection recovery process unless Session.close() is called. In addition, other objects derived from the connection object including MessageConsumers or MessageProducers are all restored.
A session implements methods for creating various objects. The following shows how each method is used after recovery.
Message Consumer Creation Method
Message consumer creation methods complete a request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createConsumer(Destination destination) createConsumer(Destination destination, java.lang.String messageSelector) createConsumer(Destination destination, java.lang.String messageSelector,boolean NoLocal)
Durable Message Subscriber Creation Method
Durable message subscriber creation methods complete the request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createDurableSubscriber(Topic topic, String name) createDurableSubscriber(Topic topic,String name, String messageSelector,boolean noLocal)
If an error occurs in the session, the session transaction is affected in the following cases.
commit()
If an error occurs while sending and receiving a message through the message producer and consumer that are created by the transaction session, then the "javax.jms.TransactionRolledBackException" is generated at the first commit point and the transaction is rolled back. If there are no messages to commit at the commit point, the exception is not thrown. Even after the failure of a commit operation, the subsequent commit operations for the transaction are executed normally.
If a failure that occurred during the commit operation is not recovered during the RequestBlockingTime, a JMSException is raised. In this case, the commit operation must be checked by using the administration tool.
rollback()
Rollback() completes a rollback request after the failure is recovered. If a failure that occurred during the rollback operation is not recovered during the RequestBlockingTime, a JMSException is raised. Even after a JMSException, the rollback operation is performed normally.
The Session.recover() method completes the recovery request even after the failure has been recovered. When an error occurs after a Session.recover() is issued and the failure is not recovered when the RequestBlockingTime expires, a JMSException is raised.
When the acknowledge mode of the session is configured to Session.CLIENT_ACKNOWLEDGE, Message.acknowledge() will be issued for the unacknowledged messages that exist in the session. If an error occurs during the acknowledgement, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for each message. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().
This section describes how to handle the errors that occur while sending messages through message producers.
The send() method of the message producer is blocked until the message is sent to the server and a response is returned. The following describes possible error scenarios for this process.
The send() method is called, but the message has not yet been sent.
After recovery, the message is sent to the server and processed successfully. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
The send() method is called, and the message was processed on the server. However, a network error occurs.
If the server is reconnected after recovery, a response message is successfully issued. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
The send method is called, and the message was processed on the server. However, a server error occurs, and then the server recovers.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
The send method is called, and the message has not yet been processed on the server. However, a network or server error occurs.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
The MessageID of the failed message can be obtained by calling MessageSendException.getErrorCode().
JEUS supports synchronous and asynchronous message reception methods, which perform recovery in different ways. Synchronous message reception methods are described first, followed by asynchronous message reception methods.
A message consumer can be invoked with three methods for synchronously receiving messages, MessageConsumer.receive(), MessageConsumer.receive(long timeout), and MessageConsumer.receiveNoWait().
The following describes what happens when an error occurs during each method call.
receive()
This method blocks until a message arrives. But when a failure occurs, it may take a long time for a message to arrive. To avoid an indefinite wait time, change the wait time to the RequestBlockingTime. If the failure is recovered before the wait time expires, send the request message again. Otherwise, a JMSException is raised.
If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the received message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception indicates that an error has occurred during the message acknowledgement, and the message may be redelivered.
receive(long timeout)
This method blocks until a message arrives. But when a failure occurs, it may take a long time for a meesage to arrive. To avoid an indetfinite time out, change the timeout value that is greater than the RequestBlockingTime to the RequestBlockingTime value. If the failure is recovered before the timeout expires, send the request message again. Otherwise, a JMSException is raised.
If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception indicates that an error has occurred during the message acknowledgement, and the message may be redelivered.
receiveNoWait()
This method does not block even if a message does not arrive. It immediately receives the next message that has arrived.
Asynchronously received messages are categorized into those being processed by MessageListener.onMessage, those being acknowledged after being processed by MessageListener.onMessage, or those prefetched and waiting in the client queue.
Each category of messages goes through a different failover process.
If a failure occurs while the onMessage method is processed, the failure is recovered first and then an acknowledgement is sent. After this, the message is normally processed.
If a failure occurs while an acknowledgement is being delivered, the ExceptionListener will issue a jeus.jms.common.message.MessageAcknowledgeException for the message that has not been acknowledged. The exception indicates that a failure occurred during message acknowledgement, and the message may be redelivered.
If a failure occurs while prefetched messages are waiting in the client queue, the failure is recovered and then the messages are sent to the server and then later to the client. The Message.getJMSRedelivered() method call for these messages may return "true".
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode().
A JEUS MQ failover is automatically and transparently processed in a client application. But when messages become lost during the message transmission process, they must be processed separately by the ExceptionListener.
Message loss in an enterprise messaging application can be critical. The only way to perfectly recover from a failure while preventing message loss is to use transactions.
It is strongly recommended to use the following method to create an application.
In the J2EE environment, messages have to be sent and received within a transaction.
For servlets, the UserTransaction must be looked up in the JNDI object, and messages must be sent and received within the UserTransaction.
For EJBs, the TransactionAttribute of the EJB method must be set to "Required" or "RequireNew" so that the messages can be sent and received within a transaction.
General Java clients call the Connection.createSession(true, Session.SESSION_TRANSACTED) method to create a session. Such sessions can send and receive messages within a transaction by calling commit() or rollback().