Logo Computer scientist,
engineer, and educator
• Articles • Articles about computing • Articles about software development • Apache integration software

Testing ActiveMQ master-slave failover on a single machine

Apache ActiveMQ supports a number of clustered modes of operation, of which the master-slave failover mode is probably the most widely used. In this mode of operation, two or more message brokers compete for the lock on a shared filestore; the broker that obtains the lock becomes the master, and any others remain slaves. Only the master broker will accept network connections from clients, so this mode of operation works in conjuction with the ActiveMQ client runtime, which detects the master and connects to it. The roles in the cluster will only change if the current master fails, in which case a slave will promote itself to master by taking control of the lock which the previous master released.

From the client's perspective, the only change needed to support this clustered mode of operation is to specify an appropriate connection URL. The simplest way to test failover is to use a failover: URL with a static list of candidate broker URLs. Such a URL has this format:

failover:(tcp://host1:port1,tcp://host2:port2...)?options

Master-slave operation is supported using shared file stores with proper locking semantics but, for the purposes of experimentation, it's possible to set up a master-slave cluster on a single host, using an ordinary directory as the file store. It's just a little fiddly to run two instances of ActiveMQ on the same host, because by default they will try to use the same IP ports. So a bit of configuration is needed.

This article describe step-by-step how to set up a two-broker master-slave cluster using two installations of ActiveMQ on the same host, with the shared filestore a simple local directory. This wouldn't be a great choice for production operations, but it's a neat way to experiment and learn about ActiveMQ clustering.

These instructions are written primarily for Linux systems; in principle other operating systems will work, but the filesystem should support locking.

If you're not familiar with ActiveMQ at all, consider reading my article JMS messaging from the ground up using Apache ActiveMQ. This article also explains where to obtain ActiveMQ and how to install it.

The tests in this article use my amqutil program, which is a general-purpose utility for interacting with ActiveMQ destinations. This article explains how to build and use amqutil. For the purposes of this article, you can use any ActiveMQ client that is capable of connecting with a failover: URL (rather than just a host and port). The activemq_active utility that comes with ActiveMQ will work perfectly well, but I find it a bit awkward to use (although that's probably just one of my little foibles.)

Step 1: Set up the cluster

Since we're running two instances of ActiveMQ on the same host, at least one of them will have to have its configuration changed, to avoid port conflicts. Both instances will need to be configured to form a master-slave cluster.

1.1 Install ActiveMQ in two separate directories

Unpack the ActiveMQ zip bundle twice, into any two convenient directories. It might be helpful to name the directories according to the port number that the broker will listen on, because it will be easier to keep track of them. In this example the ports will be 61616 (the default) and 61617, so I name the directories amq61616 and amq61617. For brevity I will refer to the two brokers using these directory names in what follows.

Leave the installation in amq6161 with its configuration at defaults for now. The installation in amq61617 needs the following configuration changes, to avoid port conflicts (since we're running both brokers on the same host).

  • change broker port 61616 to 61617 in the transportConnector sectionconf/activemq.xml
  • Remove all the other transportConnector entries in conf/activemq.xml — we won't need AMQT, STOMP, etc, for this test
  • Change the HTTP port from 8161 to 9161 in conf/jetty.xml
It might be useful to test that the two broker instances will start up without error, before embarking on the broker configuration. To start the instances in the foreground:

1.2 Test that both broker instances can be started together

They won't start as a cluster if they won't start separately, so it's worth testing this first, to ensure there are no remaining port conflicts. I would recommend running ActiveMQ in the foreground here, so you can see the console output directly:
[amq61616]$ ./bin/activemq console
INFO: Using java '/bin/java'
INFO: Starting in foreground, this is just for debugging purposes (stop process by pressing CTRL+C)
...

1.3 Master-slave configuration

Make these changes after shutting down the two broker instances.

Configure each broker instance to use the same directory to store its message log. In conf/activeq.xml in each broker edit the persistenceAdapter entry:

  <kahaDB directory="/tmp/mq"/>
You don't have to use /tmp/mq, but the directory must be the same for each broker, and on a filesystem that supports locking.

Tell each AMQ broker instance about the existence of the other by configuring complementary network connectors. In the broker listening on port 61616 edit conf/activemq.xml and add the follow after the persistenceAdapter setting.

  <networkConnectors>
     <networkConnector uri="static:(tcp://localhost:61617)"/>
  </networkConnectors>
Conversely, create a network connector to the broker on port 61616 from the activemq.xml of the broker on port 61617.

Now start both brokers in separate consoles again.

1.4 Check that the brokers are listening as expected

Of the two broker ports, 61616 or 61617 only one should currently be listening for incoming connections right now, even though both brokers are running:
$ netstat -anp|grep 6161|grep LISTEN
tcp6       0      0 :::61616                :::*                    LISTEN      20584/java  
Which port is open will depend on which broker started up first. In what follows I'll assume that amq61616 was quickest off the mark. If that's the case, in the console for the instance amq61617, which is not listening, You should see a message like this:
INFO | Database /tmp/mq/lock is locked... waiting 10 seconds for the database to be unlocked.
  Reason: java.io.IOException: File '/tmp/mq/lock' could not be locked.

Step 2: Test the cluster using amqutil

2.1 Post a message using a failover: URL

Now put a message onto the broker cluster using amqutil:
$ amqutil -r -d test -U failover:\(tcp://localhost:61617,tcp://localhost:61616\)
INFO  org.apache.activemq.transport.failover.FailoverTransport  - 
  Successfully connected to tcp://localhost:61616
Here I'm using the failover: URL to indicate the two broker URLs that the client could use. The INFO message generated by the MQ client runtime indicates that the connection was made to port 61616, which is what we should expect, as the broker with port 61617 is currently not listening.

This amqutil command produces (-r switch) one arbitrary text message to the destination (-d) test.

Now use the amquitl 'browse' (-b) switch to confirm that the message was delivered.

$ amqutil -b -d test -U failover:\(tcp://localhost:61617,tcp://localhost:61616\)
INFO  org.apache.activemq.transport.failover.FailoverTransport  - 
  Successfully connected to tcp://localhost:61616
     0 ID:localhost.localdomain-51622-1412327071242-1:1:1:1:1 TextMessage
amqutil confirms that there is one message on the destination.

2.2 Shut down the master broker

Now shut down the broker that is currently listening for connections (amq61616 in this case). If you're running in console mode, you can just hit ctrl+C in the console window to shut the broker down.

In the log file for amq61617 (which is still running) you should now see a failure message for the network connector:

2014-10-03 12:53:41,516 | WARN  | ActiveMQ Task-2  
  | DiscoveryNetworkConnector        
  | etwork.DiscoveryNetworkConnector  156 
  | 121 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610379 
  | Could not start network bridge between: vm://amq?async=false&network=true 
      and: tcp://localhost:61616 due to: java.net.ConnectException: Connection refused
This is as expected, as this instance is now down. Check which ports are now open:
$ netstat -anp|grep 6161|grep LISTEN
tcp6       0      0 :::61617                :::*                    LISTEN      20799/jav
There is now a service on port 61617. Broker amq61616, having shut down, will have released its lock on the message store, so broker amq61617 has taken up the active, master role in the cluster.

Now use amqutil to check the messages on destination test, using the same failover: URL as before:

$ amqutil -b -d test -U failover:\(tcp://localhost:61617,tcp://localhost:61616\)
INFO  org.apache.activemq.transport.failover.FailoverTransport  - 
  Successfully connected to tcp://localhost:61617
     0 ID:localhost.localdomain-51622-1412327071242-1:1:1:1:1 TextMessage
Note that this time we've got a connection to the broker on port 61617, since 61616 is down. However, the messages are exactly the same (there's just one in this example), since both brokers are sharing a message store.

Broker amq61617 now holds the lock on the shared message log so, even when amq61616 starts up again, it will continue in the slave role, until and unless something causes amq61617 to fail or shut down.

Summary

Setting up a master-slave failover cluster for testing purposes is straightforward, and a single host can be used for experimentation. Setting up a production cluster requires the implementation of a distributed filesystem, but otherwise the steps are essentially the same as in this article.
Copyright © 1994-2013 Kevin Boone. Updated Oct 04 2014