MongoDB vs. Cassandra Benchmarks

Cassandra and Mongo passed the first set of winnowing tests – both in terms of speed and functionality. In order to provide more data for an informed decision, I subjected both to a series of high throughput tests. The test suite was a total of 120 million operations (insert, update, get) – 12 runs of 10 million requests.

For those impatient for the ultimate conclusion, the executive summary is: Cassandra writes are faster, Mongo reads are faster. But what is interesting is how the latencies evolve over time, i.e. over the course of 120 million requests.

Overview of test suite

  • Remove all disk data of the persistent store with rm -rf
  • Start the NoSQL server
  • Maintain a constant throughput of 20 or 50 concurrent client threads with no rampup time
  • Insert N records
  • Get N records – this will reflect any caching or compaction (e.g. Cassandra SSTables)
  • Get N records – this should reflect already compacted tables – should be faster
  • Get N records – should correspond to previous get
  • Update N keys with different sized values (value seed). We want to make sure we’re exercising Mongo compaction
  • Repeat the three get runs again

Results Commentary

When one result is faster than another, the important question to always ask is: how important is the difference? Does it really matter? Is the slower system still fast enough? If the lower value still satisfies your requirements, then the difference is of little signficance and other factors weigh more heavily on the ultimate decision.

In order to interpret results, it is important to have a basic understanding of some statistics. You should not rely solely on the mean, but also pay particular attention to latency percentiles.  Percentiles give you an idea of the distribution of results which is especially important when predictability is required. Your SLA might say:  P reads MUST be less than 10ms where P is the percentage, e.g. 99% or 100%. This of course raises the question what is to be done for those operations that exceed N, but that is another discussion. As you can see, there are some real whoppers – note the Max value for put – 37 seconds for MongoDB and the 175 seconds for Cassandra!

As the tables below indicate,  Cassandra was a winner for puts – both inserts and updates. For the initial 10M puts, the distribution is interesting – 50% of Mongo puts were under 1ms while for Cassandra it was 2ms. But for the 90-th percentile, Mongo took 20ms but Cassandra was 4ms.

For the subsequent two batches of 10M puts (inserts), Cassandra’s throughput was 4054 and 5868 – quite a bit of variability! Note the max value for a put was 174 seconds – must have been one heck of a GC! Mongo’s two insert batches ranged from 1753 to 2197 puts per second. 90% of Cassandra puts are finishing under 4ms, whereas 90% of Mongo puts are  under 17ms.

Cassandra’s superior write numbers are  not surprising since it is optimized for faster writes than reads. See here. In order to avoid expensive disk seeks, writes are appended to sequential files called SSTables. However the hit is taken on reads, where potentially the system has to examine several SSTables for updated values. Unfortunately, for our needs it fast writes were of more importance.

For reads the situation is quite different and Mongo comes out ahead. For the first three get batches Cassandra is giving us between 2300-2600 gets per second. Mongo is in the predictable 3100s. Both of these fall within our target SLA and are roughly comparable.

However, its a completely different story for reads after we do inserts! Look at the numbers: Mongo is slightly slower at 2999 and 3099, but Cassandra drops tenfold to 294 and 350!  Que caramba! I’m assuming this is because of the need to access multiple SSTables, but I would’ve assumed that at some point the values were merged into one file.

Perhaps there’s some tuning that needs to be done for Cassandra though I saw nothing obvious in my initial look at the documentation. All I can say with confidence is, is that un-tuned Cassandra get operations drastically slow down. I did invoke an explicit compaction using nodetool, and throughput for next 10 millions gets did improve to 1000/sec. But the caveat was that Cassandra documentation did not recommend this since automated compactions will not be run after a manual one! By the way, one of Mongo’s strong suits is that they strive to minimize the amount of configuration required to get a system running.

Context

In order to establish a baseline for the vendor’s capabilities, a non-clustered server was used. For Cassandra this was single node cluster, for Mongo simply one server with no replication. Cluster tests were run for functionality.

Test Configuration

  • Number of requests: 10,000,000
  • Object Mapper: Jackson JSON
  • Key size: 32
  • Value size: average is 1000 bytes, range from 100 to 1900
Server specs
  • Centos 5.5 Linux
  • Intel Xeon 2.4 MHz 64 bit processor
  • 8 processors, 2 cores each
  • 8 GB RAM
  • The client pinging the server averaged 0.2 ms
The columns are for the most part self-descriptive. The ValSeed column indicates the random seed for the value size. This is important since we want to exercise the vendor’s compaction algorithm by assigning differently-sized values to an updated key. Values are generated  by creating an N-sized String from a randomly set of  ASCII characters. Since we want to avoid equally sized value, we want to vary N  from 100 to 1800.
  int baseSize = 100 ;
  int range = 1800 ;
  Random random = new Random(valueSizeSeed)
  int valueSize = random.nextInt(range) + baseSize ;

MongoDB

  • Version: 2.0.0
  • Journaling: turned on
Test     Req/Sec   Millis    50%    90%    99%  99.5%  99.9%    Max     Mean  Err  Fail Thr ValSeed
Put         2114  4729454      1     20     43     60   1576  36232    9.442    0     0  20    1776
Get         3182  3142258      2     10     18     28    273   4684    6.268    0     0  20
Get         3182  3142850      2     10     18     26    229   3950    6.269    0     0  20
Get         3106  3219539      2     33     88    144   2120   3885   16.060    0     0  50
Put         1753  5706060      1     17     76    278   1985  44472   11.395    0     0  20    1812
Put         2197  4552045      1     17     52    182   1293  37299    9.087    0     0  20    1846
Get         2999  3333966      2     11     19     39    308   4380    6.651    0     0  20
Get         3039  3290874      2     10     19     41    289   4676    6.565    0     0  20
PutGet       907 11027045     14     28     70    150   2531   8886   22.036    0     0  20    1861
Get         2992  3342034      2     11     20     40    299   4299    6.666    0     0  20
Get         2975  3361762      2     11     20     38    301   4478    6.707    0     0  20
Get         2957  3381393      2     34    112    166   2160   4363   16.871    0     0  50

Cassandra – Version: 0.8.6

Test     Req/Sec   Millis    50%    90%    99%  99.5%  99.9%    Max     Mean  Err  Fail Thr ValSeed
Put         6045  1654219      2      4     16     24    160   4169    3.288    0     0  20    1776
Get         2334  4285208      1     14    111    150    427   9003    8.552    0     0  20
Get         2380  4202468      2     10    110    144    290   8380    8.387    0     0  20
Get         2616  3822749      8     40    177    226   1559  24092   19.069    0     0  50
Put         4054  2466583      2      4     17     26    194 174614    4.915    0     0  20    1812
Put         5768  1733745      2      4     16     24    172   4190    3.446    0     0  20    1846
Get          294 33972166     43    141    371    567   1737  67276   67.928    0     0  20
Get          350 28551712     39    116    256    354   1629  37371   57.087    0     0  20
PutGet       285 35124479     40    124    388    801   2384 456321   70.232    0     0  20    1861
Get          210 47730677     79    182    341    418   1141  36368   95.446    0     0  20
Get          211 47305663     79    180    335    409   1097 104592   94.595    0     0  20
Get          249 40157243    175    339    598    730   2164  77675  200.751    0     0  50

Variability Examples

Here are some examples demonstrating the variability in results. The examples are from identical tests in the same context.

Mongo

First 10M puts

Test     Req/Sec   Millis    50%    90%    99%  99.5%  99.9%    Max     Mean  Err  Fail  Thr
Put         2137  4679117      1     21     41     49   1388  40931    9.338    0     0  20
Put         2156  4639027      1     21     41     48   1450  36412    9.258    0     0  20
Put         2114  4729454      1     20     43     60   1576  36232    9.442    0     0  20
Put         2079  4810938      1     20     42     55   1354  38436    9.605    0     0  20

First 10M gets

Test     Req/Sec   Millis    50%    90%    99%  99.5%  99.9%    Max     Mean  Err  Fail  Thr

Get         3187  3138189      2     10     18     25    225   4665    6.260    0     0  20
Get         3115  3210200      2     10     18     26    258  13211    6.403    0     0  20
Get         3182  3142258      2     10     18     28    273   4684    6.268    0     0  20
Get         3141  3184198      2     10     18     27    271  22285    6.352    0     0  20
Advertisements

6 Responses to “MongoDB vs. Cassandra Benchmarks”

  1. Tabrez Says:

    Your blog is pretty informative and interesting!

    Did you post these observations about 10 fold drop in performance to apache or datastax community?

    It will be interesting to know whether it is a tuning issue or a limitation.

  2. Praveen Says:

    It would be great if this test can be repeated with latest cassandra 1.0.x branch and latest MongoDB 2.0.x branch. Both have evolved quite a bit in this course of time and cassandra gained some optimizations for read performance.

    It would be great to see how they compare now. May be a rolling comparison of these two monster DBs could be just what an architect would need to asses trends in performace optimizations and make an informed decision for his needs.

    Nevertheless the effort to compare them deserves applause and appreciation.

    Cheers.

  3. nitin khosla Says:

    Hi Andre – I was wondering if read stats(dropping 10 fold) still apply to latest Cassandra/MongoDB branch?
    (if you played around and compared with latest branches..)

  4. Ken Says:

    About the slowdown after a ton of inserts.
    http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: