Posts Tagged ‘replication’

MongoDB and Cassandra Cluster Failover

October 21, 2011

One of the most important features of a scalable clustered NoSQL store is how to handle failover. The basic question is: is failover seamless from the client perspective? This is not always immediately apparent from vendor documentation especially for open-source packages where documentation is often wanting. The only way to really know is to run your own tests – to both verify vendor claims and to fully understand them. Caveat emptor – especially if the product is free!

Mongo and Cassandra use fundamentally different approaches to clustering. Mongo is based on the classical master/slave model (similar to MySQL) whereas Cassandra is a peer-to-peer system modeled on the eventual consistency paradigm pioneered by Amazon’s Dynamo system. This difference has specific ramifications regarding failover capabilities. The design trade-offs regarding the CAP theorem are described very well in the dbmusings blog post Overview of the Oracle NoSQL Database (Section CAP).

For Mongo, the client can only write to the master and therefore it is a single point of failure. Until a new master is elected from the secondaries, the cluster will not be reachable for writes.

For Mongo reads, you have two options: either talk to the master or to the secondaries.  The default  is to read from the master, and you will be subject to the same semantics as for writes. To enable reading from secondaries, you can call Mongo.slaveOk() for you client driver. For details see here.

For Cassandra, as long  as your selected consistency policy is satisfied, failing nodes will not prevent client access.

Mongo Setup

I’ll illustrate the issue with a simple three node cluster. For Mongo, you’ll need to create a three-node replica set which is well described on the Replica Set Tutorial page.

Here’s a convenience shell script to launch a Mongo replica set.

  dir=/work/server-data/mongo-cluster
  mv nohup.out old-nohup.out
  OPTS="--rest --nojournal --replSet myReplSet"
  nohup mongod $OPTS --port 27017 --dbpath $dir/node0  &
  nohup mongod $OPTS --port 27018 --dbpath $dir/node1  &
  nohup mongod $OPTS --port 27019 --dbpath $dir/node2  &

Cassandra Setup

For Cassandra, create a keyspace with replication factor of 3 in your schema definition file.

  create keyspace UserProfileKeyspace
    with strategy_options=[{replication_factor:3}]
    and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

Since I was using the standard Java Hector client along with Spring, I’ll highlight some of the key Spring bean configurations. The important point to note is the consistency policy must be quorum which means that for an operation (read or write) to succeed, two out of the three nodes must respond.

Properties File

  cassandra.hosts=host1,host2,host3
  cassandra.cluster=MyCluster
  cassandra.consistencyLevelPolicy=quoromAllConsistencyLevelPolicy

Application Context File

  <bean id="userProfileKeyspace"
        class="me.prettyprint.hector.api.factory.HFactory"
        factory-method="createKeyspace">
    <constructor-arg value="UserProfileKeyspace" />
    <constructor-arg ref="cluster"/>
    <property name="consistencyLevelPolicy" ref="${cassandra.consistencyLevelPolicy}" />
  </bean>

  <bean id="cluster"
        class="me.prettyprint.cassandra.service.ThriftCluster">
    <constructor-arg value="${cassandra.cluster}"/>
    <constructor-arg ref="cassandraHostConfigurator"/>
  </bean>

  <bean id="cassandraHostConfigurator"
        class="me.prettyprint.cassandra.service.CassandraHostConfigurator">
     <constructor-arg value="${cassandra.hosts}"/>
  </bean>

  <bean id="quorumAllConsistencyLevelPolicy"
        class="me.prettyprint.cassandra.model.QuorumAllConsistencyLevelPolicy" />

  <bean id="allOneConsistencyLevelPolicy"
        class="me.prettyprint.cassandra.model.AllOneConsistencyLevelPolicy" />

Test Scenario

The basic steps of the test are as follows. Note the Mongo example assumes you do not have slaveOk turned on.

  • Launch cluster
  • Execute N requests where N is a large number such as 100,000 to give your cluster time to fail over. The request is either a read or write.
  • While your N requests are running, kill one of the nodes. For Cassandra this can be any node since it is peer-to-peer. For Mongo, kill the master node.
  • For Cassandra, there will be no exceptions. If your requests are inserts, you will be able to subsequently retrieve them.
  • For Mongo, your requests will fail until the secondary is promoted to the master. This happens for both writes and reads. The time window is “small”, but depending upon your client request rate, the number of failed requests can be quite a few thousand! See the sample exceptions below.
With Mongo, the client can directly access only the master node. The secondaries can only be accessed by the master and never directly by the client. With Cassandra, as long as the minimum number of nodes as specified by the quorum can be reached, your operation will succeed.

Mongo Put Exception Example

I really like the message “can’t say something” – sort of cute!

com.mongodb.MongoException$Network: can't say something
    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:159)
    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:132)
    at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:343)
    at com.mongodb.DBCollection.save(DBCollection.java:641)
    at com.mongodb.DBCollection.save(DBCollection.java:608)
    at com.amm.nosql.dao.mongodb.MongodbDao.put(MongodbDao.java:48)

Mongo Get Exception Example

com.mongodb.MongoException$Network: can't call something
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:222)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:231)
    at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:303)
    at com.mongodb.DBCursor._check(DBCursor.java:360)
    at com.mongodb.DBCursor._next(DBCursor.java:442)
    at com.mongodb.DBCursor.next(DBCursor.java:525)
    at com.amm.nosql.dao.mongodb.MongodbDao.get(MongodbDao.java:38)

Mongo Get with slaveOk() 

If you invoke Mongo.slaveOk() for your client driver, then your reads will not fail if a node goes down. You will get the following warning.

Oct 30, 2011 8:12:22 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: Server seen down: localhost:27019
java.net.SocketException: Connection reset by peer: socket write error
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at org.bson.io.PoolOutputBuffer.pipe(PoolOutputBuffer.java:129)
        at com.mongodb.OutMessage.pipe(OutMessage.java:160)
        at com.mongodb.DBPort.go(DBPort.java:108)
        at com.mongodb.DBPort.go(DBPort.java:82)
        at com.mongodb.DBPort.findOne(DBPort.java:142)
        at com.mongodb.DBPort.runCommand(DBPort.java:151)
        at com.mongodb.ReplicaSetStatus$Node.update(ReplicaSetStatus.java:178)
        at com.mongodb.ReplicaSetStatus.updateAll(ReplicaSetStatus.java:349)
        at com.mongodb.ReplicaSetStatus$Updater.run(ReplicaSetStatus.java:296)
Oct 30, 2011 8:12:22 PM com.mongodb.DBTCPConnector _set