Mongo Conference Impressions

October 17, 2011

Last week I attended a full day Mongo conference hosted at our local Microsoft Nerd Center. The timing was quite fortuitous as I’m heavily involved in evaluating Mongo and Cassandra for a very large data store (600 million records). My head was full of questions especially regarding replication and sharding scenarios.

I noticed that 10gen seems to be very user responsive, and on numerous occasions speakers emphasized that client feedback drove many new features. Furthermore, speakers were very open about Mongo shortcomings. For example,  they openly admitted free list management was in their opinion wanting (I would have never known), and that version 2.2 would have a major overhaul. And above all the no-fluff quotient was high – seems everyone writes code at 10gen. See: 10Gen CEO Dwight Merriman Still Writes His Own Code!

Overall  the conference was great – a large turnout of 250 people and a good mix of presentations by 10gen folks and customers showcasing their uses of Mongo. One of the perennial conference problems I had to wrestle with was which concurrently scheduled event to attend!  MTV CMS vs Morphia Java? Replicas or Art Genome project?

I was specifically interested in obtaining some more details regarding MongoDB’s scaling capabilities in the real world – what were some of the largest sites out there, what are their issues, etc. Some of the tidbits I picked up are:

  • Largest cluster is 1000 shards
    • Each shard contains a few terabytes of data
    • Replication set of three
  • Not many folks are using shards – typical sharding factor is between 3-10.

The “Journaling and Storage Engine” by CTO Eliot Horowitz was full of gory/great details on internals. The description of how and why MongoDB uses memory mapped files was very interesting. Other subjects covered where how data and indexes are stored, journaling, fragmentation, and record padding. The upcoming MongoDB version 2.2 will have a new improved fragmentation implementation.

The talk on “Schema Design at Scale” was particularly enlightening and opened my eyes to an entirely new topic of document-oriented schema design. Just because the schema is flexible doesn’t mean that schema problems go away. On the contrary, because the flexibility allows for more choices and therefore less constraints, the number of design decisions correspondingly increases. This presents a whole new set of issues – many of them intellectually very interesting (e.g. embedded collections best practices). And many problems are the same as those facing traditional SQL databases: covering indexes, sharding partition keys, key autoincrements, B-Tree issues, etc. I forgot to ask what 10gen’s take on the recently introduced UnQL (Unstructured Query Language) was. In UnQL’s own words:  it’s an open query language for JSON, semi-structured and document databases.

The “Replication and Replica Sets” presentation described MongoDB’s replication feature in detail. Essentially it is a master/slave model in contrast to Cassandra’s peer-to-peer design. One failover problem I had discovered in high-throughput testing was the time window between a master’s death and the slave’s promotion where writes were not accepted.  The 10gen speaker confirmed my doubts and suggested queueing failed writes and then resubmitting them at a later time (not ideal).  Another issue was that heartbeats are hard-coded to 200 ms and not configurable. One nice new feature that is being worked on is standardizing client access to replica sets. Currently routing logic is dependent on client drivers, and for those sites using a mix of different language drivers this could present problems.

The “Sharding and Scaling” talk by the CTO outlined classical problems regarding sharding – the difficulty in choosing a good key.  Lots of information was provided on the Mongo shard process “mongos” that routes requests to the data process “mongod”. And then there was a config process too – quite a few processes involved here. I just noticed a new Developer Blog Contest: How do you Shard your Data? A point emphasized by several folks was that don’t wait until the last moment to add a new node to your cluster. Best to add it when the current nodes are at 70% capacity – interestingly the same percentage that Cassandra advocates. In general, adding a new node to live cluster is a very difficult exercise in regards to repartitioning current data. I didn’t get around to asking how and if Mongo uses consistent partitioning which is the basis of Dynamo-like eventual consistency stores.

From a customer use case perspective  Jeff Yemin of MTV gave a great talk  how MTV is currently using MongoDB, and also described the historical evolution of their CMS system – from SQL, to XML database to finally to a document-oriented store. Its always instructive to see how people arrive at a decision. Its all about the old philosophical maxim: context of justification and context of discovery. They’re not using sharding since all data fits on one disk.

Finally, new features for Mongo 2.2 due in January were described: improvements in concurrency, TTL collections, hash sharding features, free list management. A major concern of mine was data expiration since for my current project we need to regularly evict old data to make room for new records. Currently the only solution is to create a timestamp index, and write a manual cron-like job to delete stale items. I’ll be looking forward to TTL collections!

Cassandra Java Annotations

August 30, 2010

Overview

Cassandra has a unique column-oriented data model which does not easily map to an entity-based Java model. Furthermore, the Java Thrift client implementation is very low-level and presents the developer with a rather difficult API to work with on a daily basis. This situation  is a good candidate for an adapter to shield the business code from mundane plumbing details.

I recently did some intensive Cassandra (version 0.6.5) work to load millions of geographical postions for ships at sea.  Locations were already being stored in MySQL/Innodb using JPA/Hibernate so I already had a ready-made model based on JPA entity beans. After some analysis, I created a mini-framework based on custom annotations and a substantial adapter to encapsulate all the “ugly” Thrift boiler-plate code.  Naturally everything was wired together with Spring.

Implementation

The very first step was to investigate existing Cassandra Java client toolkits. As usual in a startup environment time was at a premium, but I quickly checked out a few key clients. Firstly, I looked at Hector, but its API still exposed too much of the Thrift cruft for my needs. It did have nice features for failover and connection pooling, and I will definitely look at it in more detail in the future. Pelops looked really cool with its Mutators and Selectors, but it too dealt with columns – see the description.  What I was looking for was an object-oriented way to load and query Java beans. Note that this OO entity-like paradigm might not be applicable to other Cassandra data models, e.g. sparse matrices.

And then there was DataNucleus which advertises JPA/JDO implementations for a large variety of non-SQL persistence stores: LDAP, Hadoop Hbase, Google App, etc. There was mention of a Cassandra solution, but it wasn’t yet ready for prime time. How they manage to address the massive semantic mismatch between JPA is beyond me – unfortunately I didn’t have time to drill down. Seems fishy – but I’ll definitely check this out in the future. Even though I’m a big fan of using existing frameworks/tools, there are times when “rolling your own” is the best course of action.

The following collaborating classes comprised the  framework:

  • CassandraDao – High-level class that understands annotated entity beans
  • ColumnFamily – An adapter for common column family operations – hides the Thrift gore
  • AnnotationManager – Manages the annotated beans
  • TypeMapper – Maps Java data types into bytes and vice versa

Since we already had a JPA-annotated Location bean, my first thought was to reuse this class and simply process the the JPA annotations into their equivalent Cassandra concepts. Upon further examination this proved ugly – the semantic mismatch was too great. I certainly did not want to be importing JPA/Hibernate packages into a Cassandra application! Furthermore, many annotations (such as collections) were not applicable and I needed  annotations for Cassandra concepts that did not exist in JPA. In “set theoretic” terms, there are JPA-specific features, Cassandra-specific features and an intersection of the two.

The first-pass implementation required only three annotations: Entity, Column and Key. The Entity annotation is a class-level annotation with keyspace and columnFamily attributes. The Column annotation closely corresponded to its JPA equivalent. The Key annotation specifies the row key. The Entity defines the column family/keyspace  that the entity belongs to and its constituent columns. The CassandraDao class corresponds to a single column family and accepts an entity and type mapper.

Two column families were created: a column family for ship definitions, and a super column family for ship locations. The Ship CF was a simple collection of ship details keyed by each ship’s MMSI (a unique ID for a ship which is typically engraved on the keel).  The Location CF represented a one-to-many relationship for all the possible locations of a ship. The key was the ship’s MMSI, and the column names were Long types representing the millisecond timestamp for the location. The value of the column was a super column – it contained the columns as defined in the ShipLocation bean – latitude, longitude, course over ground, speed over ground, etc.  The number of location for a given ship could possibly range in the millions!

From an implementation perspective, I was rather surprised to find that there are no standard reusable classes to map basic Java data types to bytes. Sure, String has getBytes(), but I had to do some non-trivial distracting detective work to get doubles, longs, BigInteger, BigDecimal and Dates converted – all the shifting magic etc. Also made sure to run some performance tests to choose the best alternative!

CassandraDao

The DAO is based on the standard concept of  a genericized DAO of which many versions are floating around:

The initial version of the DAO with basic CRUD functionality is shown below:

public class CassandraDao<T> {
  public CassandraDao(Class<T> clazz, CassandraClient client, TypeMapper mapper)
  public T get(String key)
  public void insert(T entity)
  public T getSuperColumn(String key, byte[] superColumnName)
  public List<T> getSuperColumns(String key, List<byte[]> superColumnNames)
  public void insertSuperColumn(String key, T entity)
  public void insertSuperColumns(String key, List<T> entities)
 }

Of course more complex batch and range operations that reflect advanced Cassandra API methods are needed.

Usage Sample

  import com.google.common.collect.ImmutableList;
  import org.springframework.context.support.ClassPathXmlApplicationContext;
  import org.springframework.context.ApplicationContext;

  // initialization
  ApplicationContext context = new ClassPathXmlApplicationContext("config.xml");
  CassandraDao<Ship> shipDao = (CassandraDao<Ship>)context.getBean("shipDao");
  CassandraDao<ShipLocation> shipLocationDao =
    (CassandraDao<ShipLocation>)context.getBean("shipLocationDao");
  TypeMapper mapper = (DefaultTypeMapper)applicationContext.getBean("typeMapper");

  // get ship
  Ship ship = shipDao.get("1975");

  // insert ship
  Ship ship = new Ship();
  ship.setMmsi(1975); // note: row key - framework insert() converts to required String
  ship.setName("Hokulea");
  shipDao.insert(ship);

  // get ship location (super column)
  byte [] superColumn = typeMapper.toBytes(1283116367653L));
  ShipLocation location = shipLocationDao.getSuperColumn("1975",superColumn);

  // get ship locations (super column)
  ImmutableList<byte[]> superColumns = ImmutableList.of( // Until Java 7, Google rocks!
    typeMapper.toBytes(1283116367653L),
    typeMapper.toBytes(1283116913738L),
    typeMapper.toBytes(1283116977580L));
  List<ShipLocation> locations = shipLocationDao.getSuperColumns("1975",superColumns);

  // insert ship location (super column)
  ShipLocation location = new ShipLocation();
  location.setTimestamp(new Date());
  location.setLat(20);
  location.setLon(-90);
  shipLocationDao.insertSuperColumn("1775",location);

Java Entity Beans

Ship

@Entity( keyspace="Marine", columnFamily="Ship")
public class Ship {
  private Integer mmsi;
  private String name;
  private Integer length;
  private Integer width;

  @Key
  @Column(name = "mmsi")
  public Integer getMmsi() {return this.mmsi;}
  public void setMmsi(Integer mmsi) {this.mmsi= mmsi;}

  @Column(name = "name")
  public String getName() { return name; }
  public void setName(String name) { this.name = name; }
}

ShipLocation

@Entity( keyspace="Marine", columnFamily="ShipLocation")
public class ShipLocation {
  private Integer mmsi;
  private Date timestamp;
  private Double lat;
  private Double lon;

  @Key
  @Column(name = "mmsi")
  public Integer getMmsi() {return this.mmsi;}
  public void setMmsi(Integer mmsi) {this.mmsi= mmsi;}

  @Column(name = "timestamp")
  public Date getTimestamp() {return this.timestamp;}
  public void setTimestamp(Date timestamp) {this.timestamp = msgTimestamp;}

  @Column(name = "lat")
  public Double getLat() {return this.lat;}
  public void setLat(Double lat) {this.lat = lat;}

  @Column(name = "lon")
  public Double getLon() {return this.lon;}
  public void setLon(Double lon) {this.lon = lon;}
}

Spring Configuration

 <bean id="propertyConfigurer">
   <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE" />
   <property name="location" value="classpath:config.properties</value>
 </bean>
 <bean id="shipDao" class="com.andre.cassandra.dao.CassandraDao" scope="prototype" >
   <constructor-arg value="com.andre.cassandra.data.Ship" />
   <constructor-arg ref="cassandraClient" />
   <constructor-arg ref="typeMapper" />
 </bean>
 <bean id="shipLocationDao" scope="prototype" >
   <constructor-arg value="com.andre.cassandra.data.ShipLocation" />
   <constructor-arg ref="cassandraClient" />
   <constructor-arg ref="typeMapper" />
 </bean>

<bean id="cassandraClient" class="com.andre.cassandra.util.CassandraClient" scope="prototype" >
  <constructor-arg value="${cassandra.host}" />
  <constructor-arg value="${cassandra.port}" />
</bean>

<bean id="typeMapper" class="com.andre.cassandra.util.DefaultTypeMapper" scope="prototype" />

Annotation Documentation

Annotations

Annotation Class/Field Description
Entity Class Defines the keyspace and column family
Column Field Column name
Key Field Row key

Entity Attributes

Attribute Type Description
keyspace String Keyspace
columnFamily String Column Family

Initial Cassandra Impressions

August 30, 2010

Recently I’ve been doing some intensive work with the popular NoSQL framework Cassandra. In this post I describe some of my first impressions of working with Cassandra Thrift Java stubs and some comparisons with Voldemort – another NoSQL framework that I am familiar with.

Cassandra Issues

Data Model

The Cassandra data model – with its columns and super columns is radically different from the traditional SQL data model. Most of the Cassandra descriptions are example-based, and though rich in details they lack generality. While examples are necessary they are not sufficient. What is missing is some formalism to capture the essential qualities of the model which no example fully captures. I recently came across a very good article about “NoSQL data model” from a “relational purist” that strongly resonates with me – see The Cassandra Data Model – highly recommended!

One day soon, I’ll try to write a new post summarizing some of my thoughts on NoSQL data modeling. In short, as the field matures there is going to be a need to create some types of standards out of the wide variety of implementations. There are distinct NoSQL categories: key/value stores, column-oriented stores, document-oriented stores –  but even within these categories there is much unnecessary overlap.

Regarding Cassandra columns, here’s a bit of clarification that may help. There are essentially two kinds of column families:

  • Those that have a fixed finite set of columns. The columns represent the attributes of single objects. Each row has the same number of columns, and the column names are fixed metadata.
  • Those that have an infinite set of columns that represent a collection for the key. The confusing part is that the column name is not really metadata – it is actually a value in its own right!

Thrift Client Limitations

Let me be frank – working with Cassandra’s Java Thrift client is a real pain. In part this due to the auto-generated cross-platform nature of the beast, but there are many pain points that reflect accidental and not inherent complexity. As Cassandra/Thrift matures, I hope more attention will be paid to ameliorating the life of poor programmers.

No class hierarchy for Thrift exceptions

Not deriving your exceptions from a base class is truly a disappointment. Interestingly, neither does Google ProtoBuf! The developer is  forced to either catch up to five exceptions for each call, or resort to the ugly catch Exception workaround. How much nicer would it have been to catch one Thrift base exception!

For example, just look at all the exceptions thrown by the get method of Cassandra.client!

  • org.apache.cassandra.thrift.InvalidRequestException
  • org.apache.cassandra.thrift.UnavailableException
  • org.apache.cassandra.thrift.TimedOutException
  • org.apache.thrift.TException
  • java.io.UnsupportedEncodingException

No class hierarchy for Column and SuperColumn

The core Thrift concepts Column and SuperColumn lack a base class for “implementation” reasons due to the “cross-platform” limitations of Thrift. Instead there is a ColumnOrSuperColumn class that encapsulates return results where either a Column or SuperColumn could be returned. For example, see get_slice. This leads to horrible non-OO onerous and problematic switch statements  – if is_setColumn() is true then call getColumn(), or if  is_setSuperColumn() then call getSuperColumn(). Aargh!

Documentation

Both Voldemort and Cassandra do not provide satisfactory documentation. If you are going to bet your company’s future on one of these products, you definitely have a right to expect better documentation. Interestingly, other open-source NoSQL products such as MongoDB and Riak do have better documentation.

Documentation for Voldemort configuration properties was truly a disaster (at least in version 60.1).  Parameters responsible for key system performance or even basic functionality were either cryptically documented or not at all. I counted a total of sixty properties. For the majority we were forced to scour the source code to get some basic understanding. Totally unecessary! Some examples: client.max.threads, client.max.connections.per.node, client.max.total.connections, client.connection.timeout.ms, client.routing.timeout.ms, client.max.queued.requests, enable.redirect.routing, socket.listen.queue.length, nio.parallel.processing.threshold, max.threads, scheduler.threads, socket.timeout.ms, etc.

Comparison of Cassandra with Voldemort

On the basic level, both Cassandra and Voldemort are sharded key value stores modeled on Dynamo. Cassandra can be regarded as a superset in that it also provides a data model on top of the base K/V store.

Some comparison points with Voldemort:

  • Cluster node failover
  • Quorum policies
  • Read or write optimized?
  • Can nodes be added to the cluster dynamically?
  • Pluggable store engines: Voldemort supports pluggable engines, Cassandra does not.
  • Dynamically adding column families
  • Hinted Hand-off
  • Read Repair
  • Vector Clocks

Cluster Node Failover

A Voldemort client can specify one or more cluster nodes to connect to. The first node that the client connects to will return to the client a list of all nodes. The client stubs will then account for failover and load balancing. In fact, you can plug in your custom strategies. The third-party Cassandra Java client Hector claims to support node failover.

Read/Write Optimization

Read or write optimized? Cassandra is write-optimized whereas Voldemort reads are faster. Cassandra uses a journaling and compacting paradigm model. Writes are instantaneous in that they simply append a log entry to the current log file. Reads are more expensive since they have to potentially look at more than one SSTable  file to find the latest version of a key. If you are lucky you will find it cached in memory – otherwise one or more disk accesses will have to be performed. In a way the comparison is not truly apples-to-apples since Voldemort is simply storing blobs, while Cassandra has to deal with its accompanying data model overhead. However, it is curious to see such how two basically K/V products having a different performance profile regarding this vital issue.

Pluggable store engines

Voldemort supports pluggable engines, Cassandra does not. This is a big plus for Voldemort! Out of the box, Voldemort already provides a Berkeley DB and MySQL engine and allows you to easily plug-in your own custom engine. Being able to implement your own backing store is an important concern for many shops.  In fact, on my recent project for a large telecom this was a crucial deal-breaking feature that played a large role in selecting Voldemort. We had in-house MySQL expertise and spent inordinate resources writing our own “highly optimized” MySQL engine. By the way, Riak also has pluggable engines – seven in total!

Dynamically adding column families

Neither Voldemort nor Cassandra (should do soon) support this. In order to add a new “database” or “table” you need update the configuration file and recycle all servers. Obviously this is not a viable production strategy. Riak does support this with buckets.

Quorum Policies

Quorum policies – Voldemort has one, Cassandra has several many Consistency Levels:

  • Zero – Ensure nothing. A write happens asynchronously in background
  • Any – Ensure that the write has been written to at least 1 node
  • One – Ensure that the write has been written to at least 1 replica’s commit log and memory table before responding to the client
  • Quorom – Ensure that the write has been written to N / 2 + 1 replicas before responding to the client
  • DCQuorom – As above but takes into account the rack aware placement strategy
  • All – Ensure that the write is written to all N replicas before responding to the client

Hinted Hand-off

Cassandra and Voldemort both support hinted handoff. Riak also has suppport.

Cassandra:

If a node which should receive a write is down, Cassandra will write a hint to a live replica node indicating that the write needs to be replayed to the unavailable node. If no live replica nodes exist for this key, and ConsistencyLevel.ANY was specified, the coordinating node will write the hint locally. Cassandra uses hinted handoff as a way to (1) reduce the time required for a temporarily failed node to become consistent again with live ones and (2) provide extreme write availability when consistency is not required.

Voldemort:

Hinted Handoff is extremely useful when dealing with a multiple datacenter environment. However, work remains to make this feasible.

Riak:

Hinted handoff is a technique for dealing with node failure in the Riak cluster in which neighboring nodes temporarily takeover storage operations for the failed node. When the failed node returns to the cluster, the updates received by the neighboring nodes are handed off to it.

Hinted handoff allows Riak to ensure database availability. When a node fails, Riak can continue to handle requests as if the node were still there

Read Repair

Cassandra Read Repair

Read repair means that when a query is made against a given key, we perform that query against all the replicas of the key. If a low ConsistencyLevel was specified, this is done in the background after returning the data from the closest replica to the client; otherwise, it is done before returning the data.

This means that in almost all cases, at most the first instance of a query will return old data.

Voldemort

There are several methods for reaching consistency with different guarantees and performance tradeoffs.

Two-Phase Commit — This is a locking protocol that involves two rounds of co-ordination between machines. It perfectly consistent, but not failure tolerant, and very slow.

Paxos-style consensus — This is a protocol for coming to agreement on a value that is more failure tolerant.

Read-repair — The first two approaches prevent permanent inconsistency. This approach involves writing all inconsistent versions, and then at read-time detecting the conflict, and resolving the problems. This involves little co-ordination and is completely failure tolerant, but may require additional application logic to resolve conflicts.

Riak

Read repair occurs when a successful read occurs – that is, the quorum was met – but not all replicas from which the object was requested agreed on the value. There are two possibilities here for the errant nodes:

  1. The node responded with a “not found” for the object, meaning it doesn’t have a copy.
  2. The node responded with a vector clock that is an ancestor of the vector clock of the successful read.

When this situation occurs, Riak will force the errant nodes to update their object values based on the value of the successful read.

Version Conflict Resolution – Vector Clocks

Cassandra

Cassandra departs from the Dynamo paper by omitting vector clocks and moving from partition-based consistent hashing to key ranges, while adding functionality like order-preserving partitioners and range queries.  Source.

Voldemort

Voldemort uses Dynamo-style vector clocks for versioning.

Riak

Riak utilizes vector clocks (short: vclock) to handle version control. Since any node in a Riak cluster is able to handle a request, and not all nodes need to participate, data versioning is required to keep track of a current value. When a value is stored in Riak, it is tagged with a vector clock and establishes the initial version. When it is updated, the client provides the vector clock of the object being modified so that this vector clock can be extended to reflect the update. Riak can then compare vector clocks on different versions of the object and determine certain attributes of the data.

Other

Synchronous/asynchronous writes

For Voldemort, inserts of a key’s replicas are synchronous. Cassandra allows you to choose which policy best suits you. For cross-data center replication, synchronous updates can be extremely slow.

Caching

Cassandra caches data in-memory, periodically flushing to disk. Voldemort does not cache.

Initial Python Impressions

August 12, 2010

This blog is about my initial impressions of Python in the context of an “official Python production project”. I have long used Unix scripts for basic scripting needs, and occasionally used Python (Perl less so) for more substantial tasks but it has always been “unofficial”. My latest gig involved deploying a Python program to listen to incoming AWS SQS messages and dispatch them to a downstream processing engine (business logic, MySQL database).

Though Java has been my bread ‘n butter since its inception, I am firmly in the camp of language non-bigots. I was a coder long before Java, and it is hardly the only show in town. It basically boils down to the best tool for the task at hand. After all, it is all about tools – that’s what launched us Homo sapiens onto our current trajectory towards ultimate civilization.

Python is certainly enticing, and I fully appreciate its appeal. For example, there’s no question that a Python dictionary is so much more convenient to define than a Java map. For example:

mydict = []

instead of its Java equivalent  of:

Map<String,Integer> map = new HashMap<String,Integer>()

or the upcoming Java 7 syntax improvement with inferred typing:

Map<String,Integer> map = new HashMap()

You can also use Google Guava utilities to mitigate this issue for now.

It is obviously so much easier to “whip out” a Python program to execute some basic functionality than a Java equivalent. The crux of the dilemma is: convenience for developers vs. long-term operations concerns.

It basically boils down to two issues (not necessarily unrelated):

  • Type safety
  • Size of team

If you’re one developer or a tight group of like-minded developers, then type safety issues can be mitigated by convention and mind-meld. However, as soon as the team grows, and the life cycle of the application is extended (original developers are no longer involved in maintenance), then problems begin. Its hard to imagine a type-less language such as Python comparing to Java for a large-scale development team where unrelated developers  and hundreds of thousands of lines of code are involved.

For example, without explicit typing, new developers are forced to drill down into the source code to verify method signatures. Typically in the Java world this is handled by Javadoc, IDE magic or mere perusal of source signatures. In Python, you cannot merely look at a method’s source signature for there is none – you have to actually look at the entire method’s code and all its return values (cyclomatic complexity).

An interesting recent article precisely looks at these issues in the migration from Python to Java for Nuxeo’s CMS – see here.

In order to bullet-proof production code, the developer is forced to “play compiler”. To compensate for the lack of a compiler, much of the type-checking should be done by unit tests; these unit tests  would not exist in the Java world. These tests are basically accidental complexity – extra cost – and exist only for type safety.  Here the chickens come home to roost – the trade-off between developer ease of use and run-time stability. Senior Python developers have told me that the “safe” way is to check function return values by either using “isintance()” or checking for specific attributes with “hasattr()”. Whew! This just doesn’t “smell right” to me – too dependent on the whims of individuals. The stuff of nightmares for operations folks trying to discern what went wrong at 3 AM!

One particular place I noticed that this can cause run-time production problems is in the rarely executed “except” clause of a try/except (Java’s try/catch). I ran into unpleasant surprises due to Python’s inexplicable inability to conveniently cast different values in a print statement. Where Java easily concatenates distinct types, Python requires you to cast everything to a string with the str() function if you wish to use the “+” operator – using “,” you don’t, but formatting suffers. Whew, a bit of inconsistency I’d say. You’ll never know this is a problem until an error happens.

Another Python cultural issue that strikes me as “strange” is the lack of true multi-threading due to the GIL (Global Interpreter Lock) limitation. This limitation seems to be an arbitrary constraint due to to the BDL (Benevolent Dictator for Life). Sure, threading is a non-trivial issue – as any tool it can be used or abused. But to summarily dismiss it and force people to spawn processes strikes me as arbitrary and ultimately retro.

Threading concerns can divided into two basic types:

  • Threads that access shared resources that need to be synchronized. Care, diligence and discipline need to be exercised.
  • Threads that access external resources that require no synchronization Goetz et. al. in their seminal book Java Concurrency In Practice call these deferred computations. Since there is no synchronization, programmer complexity is greatly reduced.

It is the latter that is used more often, and thus more important. Forcing users to always spawn processes is unnecessary accidental complexity. For an interesting recent perspective on the subject, see Michele Simonato’s Artima post at Threads, processes and concurrency in Python: some thoughts.

No SQL Taxonomy

May 13, 2010

In the last year or so there has been an incredible explosion of interest in the concept of No SQL. There are so many varying implementations that differ so wildly that it is often difficult to get a clear picture of what is what. Typically authors will either be intimately involved with one specific project or will give cursory overviews of a number of projects.

What is needed is some basic categorization and classification – in other words a taxonomy of the No SQL provider space. For example, what are the key criteria  used to classify implementations? There are disjoint subsets in the No SQL space, and comparison can be only made between subsets or between  implementations within a given subset. Its all about apple-to-apple comparison.

Here are a few links to shed light on the topic:

And of course let us not forget the contrarian view:

Twitter User Similarity and Collective Intelligence

May 13, 2010

This is the second part of a blog regarding a recent mini-project where I implemented a Twitter user similarity service.  The first part described my experience with the mechanics of the Twitter REST API – this part focuses more on the “collective intelligence” aspects.

The requirements were simple: define a concept of “similarity for two users” and implement a Twitter solution for it.

Resources used:

Being a bit rusty on basic CI concepts (my chagrin but in my defense there is so much computer stuff to know out there), I did a quick search for high quality links on the topic, drilled down a bit and read the high-value articles. I downloaded all free PDF chapters of the books and read relevant sections.  I went to my local Borders bookstore which was fortunately stocked with all the above books. I had already purchased Segaran’s book, so I used chapter 2 “Making Recommendations” which discusses the Euclidean distance Pearson correllation formula and this seemed to fit the bill. AIW also had an even more detailed discussion on the subject – too much to implement in the short time frame, but definitely a candidate for version two. I reviewed my statistics books, and lo and behold it turned out these were not exotic algorithms, but rather standard statistics data  comparison techniques. Too paraphrase an old sailing jingle: so much knowledge, so little time (so many boats, so little time).

I settled upon a defining the concept of similarity based on comparing word counts between two users for a set of Twitter status message for a given timeline. As usual I leveraged Spring for effortless configuration and bean wiring (thanks again Rod!). The basic logic was to issue calls to the Twitter API “method” user_timeline for each user. This returned a list of tweets for each user which I would iterate over and concatenate the Status text elements. I then computed a map of words and a count of all their occurences. This map was then  fed to the similarity scorer which would return a value between 0 and 1.

Last but not least was the WordCounter class. This object accepts raw text and returns a map of words and their counts. Of special interest is the lexical analyzer. For the first pass I used a simple String.split() and a list of stop words. But minimal analysis revealed a submerged world of complexity involving punctuations, stemming, etc. Whew! Ideally it too should be in interface.

Here’s a UML class diagram of the overall system:

The entry point is a service which returns a double value between 0 and 1 indicating user similarity.

    public interface SimilarityService {
        public double getSimilarityScore(String user1, String user2)
    }

This service interface has four implementations: two real providers (Twitter4j and JTwitter) that issue actual calls to the Twitter API for two user timelines. The mock implementation operated on files containing the concatenated raw text. As an inspirational freebie, I threw in the RssSimilarity provider which performed the similarity scoring on RSS feeds. Its quite cool at how much can be done so easily and quickly when you’ve got the right abstractions and layering in place. Nothing excessively fancy here except solid engineering practices all wrapped in rocking Spring. The other extension point was the similary scorer which computed a similarity score for two word count maps.

    public interface SimilarityScorer {
         public double calculateSimilarityScore(Map wordCount1,
              Map wordCount2);
    }

The two provided implementations are:

  • Euclidean Distance
  • Pearson Corellation

Other possible candidate solutions to be investigated are:

  • Manhattan (taxicab) distance
  • Jaccard distance

Overall, this was one of the more intellectually challenging projects in a while. On the “interest” scale it certainly compares with the NoSQL and eventual consistency stuff I’ve been recently doing. I certainly aim to pursue this topic more – hopefully in a remunerated capacity!

Twitter REST API

May 13, 2010

I recently finished a mini-project to calculate the similarity of two Twitter users. Being a REST fan and having implemented a substantial REST service, I’m always eager for an excuse to get my hands dirty with a bit o’ REST. I find it fascinating that no two REST APIs (web APIs) are the same, especially in terms of documentation.

As we all know the term “REST” refers to a style and not a standard, so it is not surprising that actual implementation vary quite a bit. This lack of  specificity is most pronounced on the client side. With SOAP, client access is almost always mediated by platform-specific client stubs automatically generated from the WSDL contract. With REST there is no such standard contract (despite WADL which has not become a dominant force), and therefore no fully automated way to build client stubs. You can partially automate the process if you define your payload as an XML schema, but this still leaves the other vital part of specifying the resource model. Section 1.3 of the JAX-RS specification explicitly states: The specification will not define client-side APIsSee my comments on the current non-standard situation of client stubs in some JAX-RS providers.

API Data Formats

In the absence of standard client stub generation mechanisms, documentation plays an increasingly important role. The fidelity to genuine REST precepts and the terminology used to describe resources and their HTTP methods becomes of prime importance to effective client usage.

How do we unambiguously describe  the different resources and methods? The number and types of payload formats influence the decision. Do we support only one format, JSON or XML? If XML, do we have a schema? If so, what schema do we use? XSD or RelaxNG? Multiple XML formats such as Atom, RSS and/or proprietary XML? By the way, the former two do not have a defined schema. Do we support multiple formats? If so, do we use prescribed REST content negotiation?

Considering the strong presence of the Twitter REST API and my short albeit intense usage of it, I am a bit reluctant to “criticize”. So upfront I issue a disclaimer that my knowledge is partial and subject to change. One very interesting fact I recently read in the book Building Social Web Applications is that over 80% of Twitter’s usage come from its API and not from its web site! Caramba, that’s quite an ecosystem that has evolved around Twitter! All the more reason to invest in API contract specification and  documentation.

General API Documentation

Professional  high quality API documentation is obviously a vital need especially as API usage increases. With an internet consumer-facing API, clients can access resources using any language of choice, so it is important to be as precise as possible. Having worked with many different APIs and services, I have come to appreciate the importance of good documentation. I regard documentation not as separate add-on to the executable code, but rather as an integral part of the experience. It is a first-order concern.

The metaphor I would suggest is DDD – Documentation Driven Development. In fact, on my last big REST project where I took on the responsibility of documenting the API, I soon found it more efficient to update the documentation as soon as any API change was made. This was especially true when data formats were modified! The document format was Atlassian Wiki which unfortunately didn’t allow for global cross-page changes, so I had to keep code and its corresponding documentation closely synchronized; otherwise the documentation would’ve quickly diverged and become unmanageable.

Deductive and Inductive Documentation

In general, documentation can be divided into deductive and inductive. Deductive documentation is based on the top-down approach. If you have an XML schema all the better – you can use this as your basic axiom, and derive all further documentation fragments in progressive refinement steps. Even in the absence of a schema, you can still leverage this principle.

Inductive documentation is solely based on examples – there is no general definition, and it is up to the client to infer commonalities. You practically have to do parallel diffs on many different XML examples to separate the common from the specific.

Theorem: all good documentation must have examples but it cannot rely only on examples! In other words, examples are necessary but not sufficient.

Google API as a Model

Google has done a great job in extracting a common subset of its many public APIs into Google Data Protocol. All Google APIs share this definition: common data formats, common errors mechanisms, header specification, collection counts, etc. Google has standardized on AtomPub and JSON as its two primary data formats (with some RSS too). It does an excellent job on having an unambiguous and clear specification of its entire protocol across all its API instances.

Take the YouTube API for example. Although neither Google nor Atom use an XML XSD schema, the precise details of the format are clearly described. Atom leverages the concept of extensions where you can insert external namespaces (vocabularies) into the base Atom XML. Google Atom does not have to reinvent the wheel for cross-cutting extensions, and can reuse common XML vocabularies in a standard way. See the Data API Protocol Page – XML element definitions page for details. Some namespaces are openSearch (Open Search Schema) for collection counts and paging, media for MRSS (yes, you can insert RSS into Atom – cool!).

Twitter Data Format Documentation

The Twitter General API documentation page and the FAQ do not do a good job in giving a high-level description of the Twitter data formats for requests and responses. There is only a one basic mention of this on the Things Every Developer Should Know page:

The API presently supports the following data formats: XML, JSON, and the RSS and Atom syndication formats, with some methods only accepting a subset of these formats.

No clear indication is given as to which kinds of resources accept which formats. Common sense would lead us to believe that JSON and proprietary XML are  isomorphic and supported for both request and responses. Being feed formats, RSS and Atom would be supported only for responses.  It is unfortunate that this is not explicitly stated anywhere.

More disturbing is the lack of an XML schema or any attempt to formally define the XML vocabulary! It seems that only XML examples are provided for each resource. Googling for “Twitter API XSD” confirms my suspicion in that it returns many mentions of “inferring XML schemas” from instances – a scary proposition indeed! What is the cardinality of XML elements? Which ones are repeatable or not? What about the data types? The DRY (don’t repeat yourself) principle is violated since you have the same XML example redundantly repeated on many pages. You can maybe get away with this for a small API, but for a widely used API such as Twitter I would have thought Twitter would have invested more resources in API contract specification.

Twitter Content Negotiation

Another concern is the way Twitter handles content negotation. Instead of using the REST convention/standard of  the ACCEPT header or a content query parameter, Twitter appends the format type to the URL (.json, .xml) which in effect creates a new resource. For example Google GData uses a query parameter such as alt=json or alt=rss to indicate data format.

TWitter API Versioning

This lack of explicit contract specification leads to problems regarding versioning. Versioning is one of those very difficult API problems that has no ideal satisfactory answer. Instead, there are partial solutions depending on the use case. Without a contract, it is difficult to even know what new changes have been implemented.

Let’s say a change is made to an XML snippet that is shared across many resource representations. Twitter would have to make changes to each resource documentation page. Even worse, it has to then have some mechanism to inform clients as to the contract change. Having some schema or at least some common way of describing the format would be a much better idea.The Twitter FAQ weakly states that clients have to proactively monitor the following:

Google uses HTTP headers to indicate API versions as well as standard namespace naming conventions.

API Client Packages

Typically REST APIs will leave it to third parties to provide language-specific client stubs. This is understandable because of the large number of languages out there – it would be prohibitive for a small(er) company  to implement and test all these packages! However the downside is that these packages are by definition non-standard (caveat emptor), and it is an open question as to how reliably they implement the current service definition.

Documentation wildly varies.   Firstly, it is not always clear which API to use if several choices exists. Being mostly a Java guy, I focus here on Java clients. For example, Twitter has four Java clients. You most often find minimal Javadoc with no further explanation. API coverage is incomplete – features are missing . For example, for the user-timeline “method” (sidebar: misuse of REST term method!), Twitter4j supports the count query parameter whereas JTwitter apparently does not. The problem here is of client fidelity to the API. When the Twitter API changes, what is the guarantee that the “mom and pop” client will sync up?

Speaking of the devil, a rather interesting development happened just the other day with Amazon’s AWS Java client. On March 22, 2010 Amazon announced  rollout of a brand new AWS SDK for Java! Until then, they too had depended on third-parties – the venerable Jets3t (only supports S3 and CloudFront) and typica. Undoubtedly this was due to client pressure for precisely those reasons enumerated above! See Mr. Jets3t’s comment on the news.

Conclusion

One of the wonders of the human condition is how people manage to work around major obstacles when there is an overriding need. The plethora of Twitter clients in the real world is truly a testimony to the ubiquity of the Twitter API.

However, there is still considerable room for improvement to remove incidental accidental complexity and maximize client productivity. Twitter is still a young company and there is still an obvious maturation process ahead.

After all, Google has many more resources to fine tune the API experience. But if I was the chief Twitter API architect, I would certainly take a long and hard look at the strategic direction of the API. Obviously there is major momentum in this direction especially with the June deprecation of Basic Auth in favor of OAuth and the realignment of the REST and Search APIs. There is no reason to blindly mimic someone else’s API documentation style (think branding), but even less reason not to learn from others (prior knowledge) and to minimize client cognitive overhead.

VTest Testing Framework

April 12, 2010

In order to test basic Voldemort API methods under specified realistic load scenarios, I leveraged the “VTest” framework that I had previously written for load testing. VTest is a light-weight Spring-based framework that separates the execution strategy from the business tasks and provides cross-cutting features such as statistics gathering and reporting.

The main features of VTest are:

  • Declarative workflow-based testing framework based on Spring
  • Separation of concerns: framework, executor, job, task, key and value generation strategies
  • Implementations of these concerns are all pluggable and configurable via Spring dependency injection and bean wiring
  • Framework handles cross-cutting concerns: error handling, call statistics, result reporting, and result persistence
  • Conceptually inspired by java.util.concurrent’s Executor
  • Executors: SequentialExecutor, FixedThreadPoolExecutor, ScheduledThreadPoolExecutor
  • Executor invokes a job or an individual task
  • A task is a unit of work – a job is a collection of tasks
  • Tasks are implemented as Java classes
  • Jobs are specified as lists of tasks in Spring XML configuration file
  • VTest configuration acts as a high-level testing DSL (Domain Specific Language)

Sample Result

Here is a sample output for a CRUD job that puts one million key/value pairs, gets them, updates them and finally deletes them. Each task is executed for N requests – N being one million – with a thread pool of 200. The pool acts as a Leaky Bucket (thanks to Joe for this handy reference). The job is executed for five iterations and both the details of each individual run and the aggregated result are displayed.

Description of columns:

  • Req/Sec – requests per second or throughput
  • Ratio – the fraction of total time for the task. The ratio is an inverse of the throughput – the higher the ratio, the lower the throughput.
  • The five % columns represent standard latency percentiles. For example, in the first PutCreate 99-th percentile means that 99% of the requests were 384 milliseconds or less.
  • Max – maximum latency. It is instructive to see that for large request sets, the 99.9 percentile doesn’t accurately portray the slowest requests. Notice that for the first PutCreate the Max is over five seconds whereas the 99.9 percentile is only 610 milliseconds. There’s a lot going on this in 0.01 % of requests! In fact Vogels makes a point that  Amazon doesn’t focus so much on averages but on reducing these exterme “outliers”.
  • Errors – number of exceptions thrown by the server. There is an example in the third PutUpdate.
  • Fails – number of failures. A failure is when the server does not throw an exception but the business logic deems the result incorrect. For example, if the retrieved value does not match its expected value, a failure is noted. Observe that there are 29,832 failures for the third Get – a rather worrisome occurrence.
  • StdDev – standard deviation
==== DETAIL STATUS ============ 

Test         Req/Sec    50%    90%    99%  99.5%  99.9%    Max  Errors  Fails  StdDev
PutCreate       9921      7     29    384    454    610   5022       0      0   61.31
PutCreate       9790      7     31    358    427    516    707       0      0   55.23
PutCreate       8727      7     32    398    457    558    980       0      0   63.98
PutCreate      14354      7     26    122    213    375    613       0      0   27.51
PutCreate       8862      7     31    402    461    577    876       0      0   63.65
Total           9639      7     30    376    442    547   5022       0      0   58.03  

Test         Req/Sec    50%    90%    99%  99.5%  99.9%    Max  Errors  Fails  StdDev
Get            24364      6     10     78     88    114    440       0      0   11.35
Get            23568      6     11     81     89    159    320       0      0   12.31
Get            22769      7     11     81     89    109    381       0  28932   11.93
Get            23174      7     10     80     87     99    372       0      0   11.78
Get            22919      7     10     80     89    216    369       0      0   13.33
Total          23264      7     10     80     88    110    440       0  28932   12.15  

Test         Req/Sec    50%    90%    99%  99.5%  99.9%    Max  Errors  Fails  StdDev
PutUpdate       6555     11     32    554    943   1115   2272       0      0  101.49
PutUpdate       6412     11     32    574    900   1083   2040       0      0  101.99
PutUpdate       2945      3     10   4007   4009   4020   6010       1      0  494.14
PutUpdate       6365     11     35    537    746   1101   2118       0      0   97.55
PutUpdate       6634     11     32    537    853   1095   1293       0      0   98.18
Total           5668     10     31    554    978   4008   6010       1      0  197.87  

Count  Exception
1      class voldemort.store.InsufficientSuccessfulNodesException  

Test         Req/Sec    50%    90%    99%  99.5%  99.9%    Max  Errors  Fails  StdDev
Delete          6888     17     46    266    342    442    860       0      0   44.37
Delete          7649     17     43    176    263    395    619       0      0   34.11
Delete          8156     17     43    133    153    244    423       0   8544   25.03
Delete          7539     17     44    180    276    447    759       0      0   36.53
Delete          7457     17     43    218    285    420    714       0      0   38.02
Total           7494     17     44    203    280    410    860       0   8544   36.44  

=== SUMMARY STATUS ============
Test         Req/Sec  Ratio    50%    90%    99%  99.5%  99.9%    Max  Errors  Fails  StdDev
DeleteTable   307456  0.01       0      0      0      0      0      2       0      0    0.01
StoreCreate     9639  0.23       7     30    376    442    547   5022       0      0   58.03
Retrieve       23264  0.09       7     10     80     88    110    440       0  28932   12.15
StoreUpdate     5668  0.38      10     31    554    978   4008   6010       1      0  197.87
Delete          7494  0.29      17     44    203    280    410    860       0   8544   36.44
Total                                                                       1  37476         

Count  Exception
1      voldemort.store.InsufficientSuccessfulNodesException  

Config Parameters:
  requests           : 1000000
  threadPoolSize     : 200
  valueSize          : 1000

Sample Chart

Since call statistics are persisted in a structured XML file, the results can be post-processed and charts can be generated. The example below compares the throughput for four different record sizes: 1k, 2k, 3k and 5k. It is implemented using the popular open-source JFreeChart package .

VTest Job Configuration File

The jobs and tasks are defined and configured in a standard Spring configuration file. For ease-of-use, the dynamically varying properties are externalized in the vtest.properties file.

<beans>
  <bean id="propertyConfigurer"
        class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="locations" value="classpath:vtest.properties" />
    <property name="systemPropertiesMode" value="2" />
  </bean>

<!-- ** Jobs/Tasks ************************ -->

  <util:list id="crud.job"  >
    <ref bean="putCreate.task" />
    <ref bean="get.task" />
    <ref bean="putUpdate.task" />
    <ref bean="delete.task" />
  </util:list>

  <bean id="putCreate.task" class="com.amm.vtest.tasks.voldemort.PutTask" scope="prototype" >
    <constructor-arg ref="taskConfig" />
    <constructor-arg value="PutCreate" />
  </bean>

  <bean id="putUpdate.task" class="com.amm.vtest.tasks.voldemort.PutTask" scope="prototype" >
    <constructor-arg ref="taskConfig" />
    <constructor-arg value="PutUpdate" />
  </bean>

  <bean id="get.task" class="com.amm.vtest.tasks.voldemort.GetTask" scope="prototype" >
    <constructor-arg ref="taskConfig" />
  </bean>

  <bean id="delete.task" class="com.amm.vtest.tasks.voldemort.DeleteTask" scope="prototype" >
    <constructor-arg ref="taskConfig" />
  </bean>

  <bean id="taskConfig" class="com.amm.vtest.tasks.voldemort.VoldemortTaskConfig" scope="prototype" >
    <constructor-arg value="${cfg.store}" />
    <constructor-arg value="${cfg.urls}" />
    <constructor-arg value="${cfg.clientConfigFile}" />
    <property name="valueSize"      value="${cfg.valueSize}" />
    <property name="valueGenerator" ref="valueGenerator" />
    <property name="keyGenerator"   ref="keyGenerator" />
    <property name="checkValue"     value="${cfg.checkRetrieveValue}" />
  </bean>

<!-- ** VTest **************** -->

  <bean id="vtestProcessor"
        class="com.amm.vtest.VTestProcessor" scope="prototype">
    <constructor-arg ref="executor" />
    <constructor-arg ref="callStatsReporter" />
    <property name="warmup"          value="${cfg.warmup}" />
    <property name="logDetails"      value="true" />
    <property name="logDetailsAsXml" value="true" />
  </bean>

  <bean id="callStatsReporter"
        class="com.amm.vtest.services.callstats.CallStatsReporter" scope="prototype">
    <property name="properties" ref="configProperties" />
  </bean>

  <util:map id="configProperties">
    <entry key="requests" value="${cfg.requests}" />
    <entry key="threadPoolSize" value="${cfg.threadPoolSize}" />
    <entry key="valueSize" value="${cfg.valueSize}" />
  </util:map >

<!-- ** Executors **************** -->

  <alias alias="executor" name="fixedThreadPool.executor" />

  <bean id="sequential.executor"
        class="com.amm.vtest.SequentialExecutor" scope="prototype">
    <property name="numRequests" value="${cfg.requests}" />
  </bean>

  <bean id="fixedThreadPool.executor"
        class="com.amm.vtest.FixedThreadPoolExecutor" scope="prototype">
    <property name="numRequests"     value="${cfg.requests}" />
    <property name="threadPoolSize"  value="${cfg.threadPoolSize}" />
    <property name="logModulo"       value="${cfg.logModulo}" />
  </bean>

</beans>

VTest Properties

cfg.urls=tcp://10.22.48.50:6666,tcp://10.22.48.51:6666,tcp://10.22.48.52:6666
cfg.store=test_mysql
cfg.requests=1000000
cfg.valueSize=1000
cfg.threadPoolSize=200
cfg.clientConfigFile=client.properties
cfg.checkRetrieveValue=false
cfg.warmup=false
cfg.logModulo=1000
cfg.fixedKeyGenerator.size=36
cfg.fixedKeyGenerator.reset=true

Run Script

. common.env

CPATH="$CPATH;config"
PGM=com.amm.vtest.VTestDriver
STORE=test_bdb
CONFIG=vtest.xml

job=crud.job
iterations=1
requests=1000000
threadPoolSize=200
valueSize=1000

opts="r:t:v:i:"
while getopts $opts opt
  do
  case $opt in
    r) requests=$OPTARG ;;
    t) threadPoolSize=$OPTARG ;;
    v) valueSize=$OPTARG ;;
    i) iterations=$OPTARG ;;
    \?) echo $USAGE " Error"
        exit;;
    esac
  done
shift `expr $OPTIND - 1`
if [ $# -gt 0 ] ; then
  job=$1
  fi

tstamp=`date "+%F_%H-%M"` ; logdir=logs-$job-$tstamp ; mkdir $logdir

PROPS=
PROPS="$PROPS -Dcfg.requests=$requests"
PROPS="$PROPS -Dcfg.threadPoolSize=$threadPoolSize"
PROPS="$PROPS -Dcfg.valueSize=$valueSize"

time -p java $PROPS -cp $CPATH $PGM $* \
  --config $CONFIG --iterations $iterations --job $job \
  | tee log.txt

cp -p log.txt log-*.xml times-*.txt *.log $logdir

XML Logging Output

The call statistics for each task run are stored in an XML files for future reference and possible post-processing, e.g. charts, database persistences, cross-run aggregation. JAXB and a XSD schema are used to process the XML.

<callStats>
    <taskName>task-Put</taskName>
    <date>2010-04-04T21:50:21.459-04:00</date>
    <callsPerSecond>13215.975471149524</callsPerSecond>
    <elapsedTime>75666</elapsedTime>
    <standardDeviation>27.547028113708425</standardDeviation>
    <callRatio>0.34021105261028106</callRatio>
    <calls failures="0" errors="0" all="1000000"/>
    <percentiles>
        <percentile50>7.0</percentile50>
        <percentile90>31.0</percentile90>
        <percentile99>124.0</percentile99>
        <percentile995>163.0</percentile995>
        <percentile999>269.0</percentile999>
    </percentiles>
</callStats>

Eventual Consistency Testing

April 12, 2010

I’ve been recently involved in testing a massively scalable application based on an eventual consistency  framework called Voldemort.

The key articles on Dynamo and eventual consistency are:

Dynamo has inspired a variety of frameworks based on distributed hash table principles such as Cassandra, Voldemort, Mongo etc. What all these tools strive to address is the inherent limit to massive scalability with traditional relational databases. Hence the name “No SQL”.

How is Dynamo tested?

All this sounds fine, but the real question is: does this work in real life? Unfortunately Amazon has not exposed the Dynamo source code, and except that it is written in Java, little is known. As a pragmatic sort of fellow, I am always keen on knowing the nuts and bolts of new-fangled solutions. How does Amazon certify builds? What is Amazon’s test framework for such a massively scalable framework such as Dynamo? What sort of tests do they have? How do they specify and test their SLAs? How do they test the intricate and complex logic associated with quorum-based logic as cluster nodes are brought up and down? I could well imagine that the complexity of such a test environment exceeding the complexity of the application itself.

Embedded and Standalone Execution Contexts

One of the nice things about the Voldemort project is its strong emphasis on modularity and mockable objects. The Voldemort server has the capability of being launched in an embedded mode, and this greatly facilitates many testing scenarios. However, this in no way replaces the need to test against an actual standalone server. Embedded vs. standalone testing is a false dilemma. The vast majority of test cases can and should be run in both modes. Embedded for ease of use, but standalone for truer validation since it more closely approximates the target production environment. So the first step was to create an “Execution Context” object that encapsulated the different bootstrapping logic.

InitContext Interface.

public interface InitContext {
  public void start() throws IOException ;
  public void stop() throws IOException ;
  public VoldemortStoreDao getTestStoreDao() ;
}

EmptyContext for standalone server. Nothing much needs to be done since server is launched externally.

public class EmptyInitContext implements InitContext
{
  private VoldemortStoreDao storeDao ;

  public EmptyInitContext(VoldemortStoreDao storeDao) {
    this.storeDao = storeDao ;
  }

  public void start() throws IOException {
  }

  public void stop() throws IOException {
  }

  public VoldemortStoreDao getTestStoreDao() {
    return storeDao ;
  }
}

EmbeddedContext for an embedded Voldemort server that uses an embedded Berkeley DB store.

public class EmbeddedServerInitContext implements InitContext
{
  private VoldemortServer server ;
  private TestConfig testConfig ;
  private VoldemortStoreDao testDao ;
  private boolean useNio = false ;

  public EmbeddedServerInitContext(TestConfig testConfig) {
    this.testConfig = testConfig ;
  }

  public void start() throws IOException {
    String configDir = testConfig.getVoldemortConfigDir();
    String dataDir = testConfig.getVoldemortDataDir();
    int nodeId = 0 ;
    server = new VoldemortServer(
      ServerTestUtils.createServerConfig(useNio, nodeId, dataDir,
        configDir + "/cluster.xml", configDir + "/stores.xml",
        new Properties() ));
    server.start();

    StoreRepository srep = server.getStoreRepository();
    List stores = srep.getAllLocalStores() ;
    for (Store store : stores) {
      Store lstore = VoldemortTestUtils.getLeafStore(store);
      if (lstore instanceof StorageEngine) {
        if (store.getName().equals(testConfig.getStoreName())) {
          StorageEngine engine = (StorageEngine) lstore ;
          StorageEngineDaoImpl dao = new StorageEngineDaoImpl(engine);
          testDao = dao ;
          break;
          }
        }
      }
  }

  public void stop() throws IOException {
    ServerTestUtils.stopVoldemortServer(server);
  }

  public VoldemortStoreDao getTestStoreDao() {
    return testDao ;
  }
}

Server Cycling

One important place where embedded and standalone testing logic do differ is in server cycling. This is especially important when testing eventual consistency scenarios. Server cycling refers to the starting and stopping of server nodes. In embedded mode this is no problem since everything is executing inside one JVM process. When the servers are separate processes, the problem becomes significantly more difficult. Stopping a remote Voldemort server actually turns out to be easy since Voldemort exposes a JMX MBean with a stop operation. Needless to say this technique can not be used to start a server! In order to launch a server, the test client has to somehow invoke a script on a remote machine. The following steps need to done:

  • Use Java Runtime.exec to ssh a script on remote machine
  • Script must first check that a server is not running – if it is an error is returned
  • Script calls voldemort-server.sh
  • Script waits an indeterminate amount of time to allow the server to start
  • Script invokes “some operation” to ascertain that the server is ready to accept requests

As you can see each step is fraught with problems. In local embedded mode this series of complex steps is subsumed in the blocking call to simply create a new in-process object. In standalone mode, the wait step is problematic since there is no precise amount of time to wait. Wait and then do what to determine server liveliness? Invoke an operation? This would/could affect the integrity of the very operation we are testing! One potential solution is to invoke a JMX operation that would serve the purpose of a liveliness check. Assuming all goes well, all of this takes time and for a large battery of tests the overall execution time is significantly increased.

Eventual Consistency Test Example

Let us look at some examples. Assume we have a three node cluster with N=3,W=2,R=2. N is the number of nodes, W is the number of writes that must succeed and R is the required reads. For example, for a write operation the system will try to write the data to all nodes. If two (or three) succeed the operation is considered as succesful.

Get – One Node Down

  • Call put(K,V)
  • For each node in cluster
    • Stop node
    • Call V2=get(K)
    • Assert that V==V2
    • Start node

This logic needs to be executed against all thirteen Voldemort operations: get, put, putIfNotObsolete, delete, etc. Whew! Now imagine if requirements are to test against two different cluster configurations, a 3/2/2 and 5/3/3!

Comparison of JAX-RS Client Proxies

November 18, 2009

Though JAX-RS has done wonders for standardizing server-side implementation of Java REST services, there is definitely a need for some client-side standardization. Considering that for every REST implementation there are order-of-magnitude more clients, it is rather puzzling that more movement hasn’t occurred in this space.

Section 1.3 Non Goals of the JAX-RS 1.0 (Sep. 2008) spec states:

Client APIs The specification will not define client-side APIs. Other specifications are expected to provide such functionality.

Nevertheless, vendors have not been idle and have used their particular client-side frameworks as a value-add selling point. In this blog, I report some first impressions on CXF and RESTEasy‘s client APIs. Next report will be on the Jersey and Restlet versions.

Perhaps the reluctance to tackle a standard client API has something to do with the complexity associated with SOAP client proxies, but in the absence of REST client proxies, every single customer has to recreate the wheel and implement rather mundane low-level plumbing with httpclient. A chore best avoided.

Both CXF and RESTEasy support a nearly equivalent client proxy API that mirrors the server-side annotated resource implementation. They differ in two ways:

  • Bootstrap proxy creation
  • Exceptions thrown for errors

“Naturally”, each provider has a different way to create a proxy. Both are rather simple, and since they are a one-time action, their impact on the rest of the client code is minimal.

The “happy path” behavior for both implementations is the same – differences arise when exceptions are encountered. RESTEasy uses its own proprietary org.jboss.resteasy.client.ClientResponseFailure exception while CXF manages to use the standard JAX-RS exception javax.ws.rs.WebApplicationException. Therefore, round one goes to CXF since we can write all our client tests using standard JAX-RS packages. In addition, this enables us to leverage these same tests for testing the server implementation – an absolute win-win.

Note that the test examples below use the outstanding testng.

Proxy Interface

Here’s the client proxy VideoServiceProxy that is the same for both CXF and RESTeasy. Very nice guys!

public interface VideoServiceProxy {
@GET
@Path("genre")
@Produces("application/xml")
public GenreList getGenres(); 

@GET
@Path("genre/{id}/")
@Produces("application/xml")
public Genre getGenre(@PathParam("id") String id) ; 

@POST
@Path("genre")
@Consumes("application/xml")
public Response createGenre(Genre genre) ; 

@PUT
@Path("genre/{id}/")
@Consumes("application/xml")
public void updateGenre(@PathParam("id") String id, Genre genre) ; 

@DELETE
@Path("genre/{id}/")
public void deleteGenre(@PathParam("id") String id) ; 

}

Proxy Bootstrapping

The client side bootstrapping for the proxy is shown below.

CXF Bootstrapping

import org.apache.cxf.jaxrs.client.JAXRSClientFactory;
import org.apache.cxf.jaxrs.client.JAXRSClientFactoryBean;

public class GenreTest {
    static String url = "http://localhost/vapp/vservice/genre";
    static VideoServiceProxy rservice ;

    @BeforeSuite
    public void initSuite() {
        rservice = JAXRSClientFactory.create(url, VideoServiceProxy.class);
    }
}

RESTEasy Bootstrapping

import org.apache.commons.httpclient.HttpClient;
import org.jboss.resteasy.client.ProxyFactory;
import org.jboss.resteasy.plugins.providers.RegisterBuiltin;
import org.jboss.resteasy.spi.ResteasyProviderFactory;

public GenreTest {
    private static VideoServiceProxy rservice ;
    private static String url = "http://localhost/vapp/vservice/genre" ;   

    @BeforeClass
    static public void beforeClass() {
        RegisterBuiltin.register(ResteasyProviderFactory.getInstance());
        rservice = ProxyFactory.create(VideoServiceProxy.class, url, new HttpClient());
    }
}

As you can see, the CXF version is slightly less verbose and simpler in that it has less imports.

Exception Handling

Happy Path – No Errors

For a happy path the test code is the same for both CXF and RESTEasy.

@Test
public void createGenre()  {
    Genre genre = new Genre();
    genre.setName("Animals");
    genre.setDescription("All animals");
    Response response = rservice.createGenre(obj);
    int status = response.getStatus();
    Assert.assertEquals(status,Response.Status.CREATED.getStatusCode());
    String createdId = ResponseUtils.getCreatedId(response); // get ID from standard Location header
    Assert.assertNotNull(createdId);
}

CXF Unhappy Path – Error

    @Test
    public void getGenreNonExistent() {
        try {
            Genre genre = rservice.getGenre(nonExistentId);
            Assert.fail();
        }
        catch (WebApplicationException e) {
            Response response = e.getResponse();
            int status = response.getStatus();
            Assert.assertEquals(status, Response.Status.INTERNAL_SERVER_ERROR.getStatusCode()); // TODO: fix 404 not being thrown?
            //Assert.assertEquals(status, Response.Status.NOT_FOUND.getStatusCode());
        }
    }

RESTEasy Unhappy Path – Error

    @Test
    public void getGenreNonExistent() {
        try {
            Genre obj = rservice.getGenre(nonExistentId);
        }
        catch (ClientResponseFailure e) {
            ClientResponse response = e.getResponse();
            Response.Status status = response.getResponseStatus();
            Assert.assertEquals(status, Response.Status.INTERNAL_SERVER_ERROR); // TODO: fix 404 not being thrown
            //Assert.assertEquals(status, Response.Status.NOT_FOUND);
        }
    }

Note, that there is a problem in correctly throwing a 404 with WebApplicationException on the server-side. Thought my current server implementation is CXF, I have also verified that this problem exists for RESTEasy and Jersey. The framework always returns a 500 even though I specifiy a 404. This is definitely not OK. Its a TBD for me to further investigate.

    throw new WebApplicationException(Response.Status.NOT_FOUND);

I definitely plan to check out Jersey and Restlet in more detail, so stay tuned!

JAX-RS: Java™ API for RESTful
Web Services
Version 1.0
September 8, 2008