Cassandra Java Annotations

Overview

Cassandra has a unique column-oriented data model which does not easily map to an entity-based Java model. Furthermore, the Java Thrift client implementation is very low-level and presents the developer with a rather difficult API to work with on a daily basis. This situation  is a good candidate for an adapter to shield the business code from mundane plumbing details.

I recently did some intensive Cassandra (version 0.6.5) work to load millions of geographical postions for ships at sea.  Locations were already being stored in MySQL/Innodb using JPA/Hibernate so I already had a ready-made model based on JPA entity beans. After some analysis, I created a mini-framework based on custom annotations and a substantial adapter to encapsulate all the “ugly” Thrift boiler-plate code.  Naturally everything was wired together with Spring.

Implementation

The very first step was to investigate existing Cassandra Java client toolkits. As usual in a startup environment time was at a premium, but I quickly checked out a few key clients. Firstly, I looked at Hector, but its API still exposed too much of the Thrift cruft for my needs. It did have nice features for failover and connection pooling, and I will definitely look at it in more detail in the future. Pelops looked really cool with its Mutators and Selectors, but it too dealt with columns – see the description.  What I was looking for was an object-oriented way to load and query Java beans. Note that this OO entity-like paradigm might not be applicable to other Cassandra data models, e.g. sparse matrices.

And then there was DataNucleus which advertises JPA/JDO implementations for a large variety of non-SQL persistence stores: LDAP, Hadoop Hbase, Google App, etc. There was mention of a Cassandra solution, but it wasn’t yet ready for prime time. How they manage to address the massive semantic mismatch between JPA is beyond me – unfortunately I didn’t have time to drill down. Seems fishy – but I’ll definitely check this out in the future. Even though I’m a big fan of using existing frameworks/tools, there are times when “rolling your own” is the best course of action.

The following collaborating classes comprised the  framework:

  • CassandraDao – High-level class that understands annotated entity beans
  • ColumnFamily – An adapter for common column family operations – hides the Thrift gore
  • AnnotationManager – Manages the annotated beans
  • TypeMapper – Maps Java data types into bytes and vice versa

Since we already had a JPA-annotated Location bean, my first thought was to reuse this class and simply process the the JPA annotations into their equivalent Cassandra concepts. Upon further examination this proved ugly – the semantic mismatch was too great. I certainly did not want to be importing JPA/Hibernate packages into a Cassandra application! Furthermore, many annotations (such as collections) were not applicable and I needed  annotations for Cassandra concepts that did not exist in JPA. In “set theoretic” terms, there are JPA-specific features, Cassandra-specific features and an intersection of the two.

The first-pass implementation required only three annotations: Entity, Column and Key. The Entity annotation is a class-level annotation with keyspace and columnFamily attributes. The Column annotation closely corresponded to its JPA equivalent. The Key annotation specifies the row key. The Entity defines the column family/keyspace  that the entity belongs to and its constituent columns. The CassandraDao class corresponds to a single column family and accepts an entity and type mapper.

Two column families were created: a column family for ship definitions, and a super column family for ship locations. The Ship CF was a simple collection of ship details keyed by each ship’s MMSI (a unique ID for a ship which is typically engraved on the keel).  The Location CF represented a one-to-many relationship for all the possible locations of a ship. The key was the ship’s MMSI, and the column names were Long types representing the millisecond timestamp for the location. The value of the column was a super column – it contained the columns as defined in the ShipLocation bean – latitude, longitude, course over ground, speed over ground, etc.  The number of location for a given ship could possibly range in the millions!

From an implementation perspective, I was rather surprised to find that there are no standard reusable classes to map basic Java data types to bytes. Sure, String has getBytes(), but I had to do some non-trivial distracting detective work to get doubles, longs, BigInteger, BigDecimal and Dates converted – all the shifting magic etc. Also made sure to run some performance tests to choose the best alternative!

CassandraDao

The DAO is based on the standard concept of  a genericized DAO of which many versions are floating around:

The initial version of the DAO with basic CRUD functionality is shown below:

public class CassandraDao<T> {
  public CassandraDao(Class<T> clazz, CassandraClient client, TypeMapper mapper)
  public T get(String key)
  public void insert(T entity)
  public T getSuperColumn(String key, byte[] superColumnName)
  public List<T> getSuperColumns(String key, List<byte[]> superColumnNames)
  public void insertSuperColumn(String key, T entity)
  public void insertSuperColumns(String key, List<T> entities)
 }

Of course more complex batch and range operations that reflect advanced Cassandra API methods are needed.

Usage Sample

  import com.google.common.collect.ImmutableList;
  import org.springframework.context.support.ClassPathXmlApplicationContext;
  import org.springframework.context.ApplicationContext;

  // initialization
  ApplicationContext context = new ClassPathXmlApplicationContext("config.xml");
  CassandraDao<Ship> shipDao = (CassandraDao<Ship>)context.getBean("shipDao");
  CassandraDao<ShipLocation> shipLocationDao =
    (CassandraDao<ShipLocation>)context.getBean("shipLocationDao");
  TypeMapper mapper = (DefaultTypeMapper)applicationContext.getBean("typeMapper");

  // get ship
  Ship ship = shipDao.get("1975");

  // insert ship
  Ship ship = new Ship();
  ship.setMmsi(1975); // note: row key - framework insert() converts to required String
  ship.setName("Hokulea");
  shipDao.insert(ship);

  // get ship location (super column)
  byte [] superColumn = typeMapper.toBytes(1283116367653L));
  ShipLocation location = shipLocationDao.getSuperColumn("1975",superColumn);

  // get ship locations (super column)
  ImmutableList<byte[]> superColumns = ImmutableList.of( // Until Java 7, Google rocks!
    typeMapper.toBytes(1283116367653L),
    typeMapper.toBytes(1283116913738L),
    typeMapper.toBytes(1283116977580L));
  List<ShipLocation> locations = shipLocationDao.getSuperColumns("1975",superColumns);

  // insert ship location (super column)
  ShipLocation location = new ShipLocation();
  location.setTimestamp(new Date());
  location.setLat(20);
  location.setLon(-90);
  shipLocationDao.insertSuperColumn("1775",location);

Java Entity Beans

Ship

@Entity( keyspace="Marine", columnFamily="Ship")
public class Ship {
  private Integer mmsi;
  private String name;
  private Integer length;
  private Integer width;

  @Key
  @Column(name = "mmsi")
  public Integer getMmsi() {return this.mmsi;}
  public void setMmsi(Integer mmsi) {this.mmsi= mmsi;}

  @Column(name = "name")
  public String getName() { return name; }
  public void setName(String name) { this.name = name; }
}

ShipLocation

@Entity( keyspace="Marine", columnFamily="ShipLocation")
public class ShipLocation {
  private Integer mmsi;
  private Date timestamp;
  private Double lat;
  private Double lon;

  @Key
  @Column(name = "mmsi")
  public Integer getMmsi() {return this.mmsi;}
  public void setMmsi(Integer mmsi) {this.mmsi= mmsi;}

  @Column(name = "timestamp")
  public Date getTimestamp() {return this.timestamp;}
  public void setTimestamp(Date timestamp) {this.timestamp = msgTimestamp;}

  @Column(name = "lat")
  public Double getLat() {return this.lat;}
  public void setLat(Double lat) {this.lat = lat;}

  @Column(name = "lon")
  public Double getLon() {return this.lon;}
  public void setLon(Double lon) {this.lon = lon;}
}

Spring Configuration

 <bean id="propertyConfigurer">
   <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE" />
   <property name="location" value="classpath:config.properties</value>
 </bean>
 <bean id="shipDao" class="com.andre.cassandra.dao.CassandraDao" scope="prototype" >
   <constructor-arg value="com.andre.cassandra.data.Ship" />
   <constructor-arg ref="cassandraClient" />
   <constructor-arg ref="typeMapper" />
 </bean>
 <bean id="shipLocationDao" scope="prototype" >
   <constructor-arg value="com.andre.cassandra.data.ShipLocation" />
   <constructor-arg ref="cassandraClient" />
   <constructor-arg ref="typeMapper" />
 </bean>

<bean id="cassandraClient" class="com.andre.cassandra.util.CassandraClient" scope="prototype" >
  <constructor-arg value="${cassandra.host}" />
  <constructor-arg value="${cassandra.port}" />
</bean>

<bean id="typeMapper" class="com.andre.cassandra.util.DefaultTypeMapper" scope="prototype" />

Annotation Documentation

Annotations

Annotation Class/Field Description
Entity Class Defines the keyspace and column family
Column Field Column name
Key Field Row key

Entity Attributes

Attribute Type Description
keyspace String Keyspace
columnFamily String Column Family

5 Responses to “Cassandra Java Annotations”

  1. Marc C Says:

    Hi,

    Do you have the source code for this implementation available anywhere?
    I am looking at creating a similar Generic DAO and would like to look at your implementation for ideas.

    Thanks,
    Marc

    • amesar Says:

      Sorry about the delay. For implementation, I used the raw Thrift Java code and I wouldn’t wish that on my worse enemy. If I had to do it over I’d certainly use Hector. I’ve got the source code buried away – I could polish it up and send it your way if you’re still interested.

  2. jetlag Says:

    do you have a code project or zip somewhere?

  3. Amresh Says:

    You may want to try Kundera which is JPA based ORM library for Cassandra/ HBase and MongoDB.
    Link: http://github.com/impetus-opensource/Kundera

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: