Overview
Cassandra has a unique column-oriented data model which does not easily map to an entity-based Java model. Furthermore, the Java Thrift client implementation is very low-level and presents the developer with a rather difficult API to work with on a daily basis. This situation is a good candidate for an adapter to shield the business code from mundane plumbing details.
I recently did some intensive Cassandra (version 0.6.5) work to load millions of geographical postions for ships at sea. Locations were already being stored in MySQL/Innodb using JPA/Hibernate so I already had a ready-made model based on JPA entity beans. After some analysis, I created a mini-framework based on custom annotations and a substantial adapter to encapsulate all the “ugly” Thrift boiler-plate code. Naturally everything was wired together with Spring.
Implementation
The very first step was to investigate existing Cassandra Java client toolkits. As usual in a startup environment time was at a premium, but I quickly checked out a few key clients. Firstly, I looked at Hector, but its API still exposed too much of the Thrift cruft for my needs. It did have nice features for failover and connection pooling, and I will definitely look at it in more detail in the future. Pelops looked really cool with its Mutators and Selectors, but it too dealt with columns – see the description. What I was looking for was an object-oriented way to load and query Java beans. Note that this OO entity-like paradigm might not be applicable to other Cassandra data models, e.g. sparse matrices.
And then there was DataNucleus which advertises JPA/JDO implementations for a large variety of non-SQL persistence stores: LDAP, Hadoop Hbase, Google App, etc. There was mention of a Cassandra solution, but it wasn’t yet ready for prime time. How they manage to address the massive semantic mismatch between JPA is beyond me – unfortunately I didn’t have time to drill down. Seems fishy – but I’ll definitely check this out in the future. Even though I’m a big fan of using existing frameworks/tools, there are times when “rolling your own” is the best course of action.
The following collaborating classes comprised the framework:
- CassandraDao – High-level class that understands annotated entity beans
- ColumnFamily – An adapter for common column family operations – hides the Thrift gore
- AnnotationManager – Manages the annotated beans
- TypeMapper – Maps Java data types into bytes and vice versa
Since we already had a JPA-annotated Location bean, my first thought was to reuse this class and simply process the the JPA annotations into their equivalent Cassandra concepts. Upon further examination this proved ugly – the semantic mismatch was too great. I certainly did not want to be importing JPA/Hibernate packages into a Cassandra application! Furthermore, many annotations (such as collections) were not applicable and I needed annotations for Cassandra concepts that did not exist in JPA. In “set theoretic” terms, there are JPA-specific features, Cassandra-specific features and an intersection of the two.
The first-pass implementation required only three annotations: Entity, Column and Key. The Entity annotation is a class-level annotation with keyspace and columnFamily attributes. The Column annotation closely corresponded to its JPA equivalent. The Key annotation specifies the row key. The Entity defines the column family/keyspace that the entity belongs to and its constituent columns. The CassandraDao class corresponds to a single column family and accepts an entity and type mapper.
Two column families were created: a column family for ship definitions, and a super column family for ship locations. The Ship CF was a simple collection of ship details keyed by each ship’s MMSI (a unique ID for a ship which is typically engraved on the keel). The Location CF represented a one-to-many relationship for all the possible locations of a ship. The key was the ship’s MMSI, and the column names were Long types representing the millisecond timestamp for the location. The value of the column was a super column – it contained the columns as defined in the ShipLocation bean – latitude, longitude, course over ground, speed over ground, etc. The number of location for a given ship could possibly range in the millions!
From an implementation perspective, I was rather surprised to find that there are no standard reusable classes to map basic Java data types to bytes. Sure, String has getBytes(), but I had to do some non-trivial distracting detective work to get doubles, longs, BigInteger, BigDecimal and Dates converted – all the shifting magic etc. Also made sure to run some performance tests to choose the best alternative!
CassandraDao
The DAO is based on the standard concept of a genericized DAO of which many versions are floating around:
- Build a generic typesafe DAO with Hibernate and Spring AOP – IBM developerWorks
- To a generic hibernate example DAO – codeweblog
- Krank framework -Rick Hightower
The initial version of the DAO with basic CRUD functionality is shown below:
public class CassandraDao<T> { public CassandraDao(Class<T> clazz, CassandraClient client, TypeMapper mapper) public T get(String key) public void insert(T entity) public T getSuperColumn(String key, byte[] superColumnName) public List<T> getSuperColumns(String key, List<byte[]> superColumnNames) public void insertSuperColumn(String key, T entity) public void insertSuperColumns(String key, List<T> entities) }
Of course more complex batch and range operations that reflect advanced Cassandra API methods are needed.
Usage Sample
import com.google.common.collect.ImmutableList; import org.springframework.context.support.ClassPathXmlApplicationContext; import org.springframework.context.ApplicationContext; // initialization ApplicationContext context = new ClassPathXmlApplicationContext("config.xml"); CassandraDao<Ship> shipDao = (CassandraDao<Ship>)context.getBean("shipDao"); CassandraDao<ShipLocation> shipLocationDao = (CassandraDao<ShipLocation>)context.getBean("shipLocationDao"); TypeMapper mapper = (DefaultTypeMapper)applicationContext.getBean("typeMapper"); // get ship Ship ship = shipDao.get("1975"); // insert ship Ship ship = new Ship(); ship.setMmsi(1975); // note: row key - framework insert() converts to required String ship.setName("Hokulea"); shipDao.insert(ship); // get ship location (super column) byte [] superColumn = typeMapper.toBytes(1283116367653L)); ShipLocation location = shipLocationDao.getSuperColumn("1975",superColumn); // get ship locations (super column) ImmutableList<byte[]> superColumns = ImmutableList.of( // Until Java 7, Google rocks! typeMapper.toBytes(1283116367653L), typeMapper.toBytes(1283116913738L), typeMapper.toBytes(1283116977580L)); List<ShipLocation> locations = shipLocationDao.getSuperColumns("1975",superColumns); // insert ship location (super column) ShipLocation location = new ShipLocation(); location.setTimestamp(new Date()); location.setLat(20); location.setLon(-90); shipLocationDao.insertSuperColumn("1775",location);
Java Entity Beans
Ship
@Entity( keyspace="Marine", columnFamily="Ship") public class Ship { private Integer mmsi; private String name; private Integer length; private Integer width; @Key @Column(name = "mmsi") public Integer getMmsi() {return this.mmsi;} public void setMmsi(Integer mmsi) {this.mmsi= mmsi;} @Column(name = "name") public String getName() { return name; } public void setName(String name) { this.name = name; } }
ShipLocation
@Entity( keyspace="Marine", columnFamily="ShipLocation") public class ShipLocation { private Integer mmsi; private Date timestamp; private Double lat; private Double lon; @Key @Column(name = "mmsi") public Integer getMmsi() {return this.mmsi;} public void setMmsi(Integer mmsi) {this.mmsi= mmsi;} @Column(name = "timestamp") public Date getTimestamp() {return this.timestamp;} public void setTimestamp(Date timestamp) {this.timestamp = msgTimestamp;} @Column(name = "lat") public Double getLat() {return this.lat;} public void setLat(Double lat) {this.lat = lat;} @Column(name = "lon") public Double getLon() {return this.lon;} public void setLon(Double lon) {this.lon = lon;} }
Spring Configuration
<bean id="propertyConfigurer"> <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE" /> <property name="location" value="classpath:config.properties</value> </bean> <bean id="shipDao" class="com.andre.cassandra.dao.CassandraDao" scope="prototype" > <constructor-arg value="com.andre.cassandra.data.Ship" /> <constructor-arg ref="cassandraClient" /> <constructor-arg ref="typeMapper" /> </bean> <bean id="shipLocationDao" scope="prototype" > <constructor-arg value="com.andre.cassandra.data.ShipLocation" /> <constructor-arg ref="cassandraClient" /> <constructor-arg ref="typeMapper" /> </bean> <bean id="cassandraClient" class="com.andre.cassandra.util.CassandraClient" scope="prototype" > <constructor-arg value="${cassandra.host}" /> <constructor-arg value="${cassandra.port}" /> </bean> <bean id="typeMapper" class="com.andre.cassandra.util.DefaultTypeMapper" scope="prototype" />
Annotation Documentation
Annotations
Annotation | Class/Field | Description |
Entity | Class | Defines the keyspace and column family |
Column | Field | Column name |
Key | Field | Row key |
Entity Attributes
Attribute | Type | Description |
keyspace | String | Keyspace |
columnFamily | String | Column Family |