Often the first error handling code I see from new Cassandra users is the client side rollback in an attempt to replicate database transactions from the ACID world. This is typically done when a write to multiple tables fail, or a write is not able to meet requested Consistency Level. Unfortunately, client side rollbacks, while well intentioned, generally create more problems than they solve. Let’s discuss how this will end up
Distributed Problem Set of Rollbacks
This is a sample of the type of code I see (obviously not with hardcoded values).
session.execute("INSERT INTO my_db.events
(id, text, ts)
values (1, 'added record', 2015-01-23 11:00:01.923'");
session.execute("INSERT INTO my_db.user_events
(user_id, id) values (100, 1)");
session.execute("DELETE FROM my_db.events
The problem with this is, there are several servers responsible for this source of truth, and in a working system, this code will usually work fine with a sleep in between the operations. However, Thread.sleep(100) is rarely a safe approach in practice or remotetly professional. What do you do?
Retry, The Distributed Alternative To Rollbacks
However, the typical approach for experienced Cassandra users is to retry the transactions and most drivers will even do this by default for timeouts. Conceptually, if done by hand the code would look more like:
String query = "INSERT INTO my_db.user_events
(user_id, id) values (100, 1)";
// optionally you can even attempt a
// circuit breaker pattern
// and write a backup location such as a
// queue to retry later
// only useful in the most extreme cases
// or most limited
So lets talk about the practical application of these theories in our data model.
There are no updates to any writes beyond retries, another name for this isEvent Sourcing. The number of updates will determine how effective this is and there are a whole raft of other considerations when it comes to partition sizing. This will work really well with the Lambda Architecture as different analytics tools can combine down these separt
Retries of the same value will have the same result no matter what. This is safely idempotent and free from race conditions that result in permanently inconsistent state.
Your timestamps should be based on the time of the update, this will allow to retry with huge time difference and still have an accurate result.
Must be aware of partition key width. An upcoming blog post will discuss partition sizing for now rule of thumb is limit it to 100k items and 32megs. While these are not remotely hard and fast rules and more recent versions of Cassandra are happy with a lot more, and different query styles are able to tolerate larger partitions than this, these are good guidelines to start out with if you’re new. Use nodetool cfhistograms <keyspace> <table> to get these numbers on a given node.
Because it uses a server side generated timeuuid a retry will result in LOTS of different timeuuids and will never give you the same result twice.
I hope this has given the reader enough ammo to start building out Idempotent data models that fit in line with distributed principles and lead to a well understood and well behaving application that is tolerant of all sorts of failure modes in a consistent and easy to understand way.
About Ryan Svihla
I consider myself a full stack polyglot, and I have been writing a lot of JS and Ruby as of late. Currently, I'm a solutions architect at DataStax