Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

gbrits gbrits at gmail.com
Wed Jul 17 07:38:07 EDT 2013

Somewhere (can't find it now) I've read that Riak, like Cassandra could be
classified as a column store. 

This is just a name of course but what I understand from Cassandra is that
this allows for space-efficient encoding of column-values. Basically storage
is surrounded around columns instead of rows, allowing for different
persistence strategies on a per-column, or column-family, basis. Moreover,
it would allow for zero storage overhead for non-existent column values.
I.e: basically allowing for efficient storage of sparse data-sets.

Does Riak have this property as well?

More specifically, I've got a datastructure on paper with the following
properties, when mapped to riak nomenclature:

- ~ 1.000.000 keys (will not grow)
- ~ 1.000 columns.  (may grow)
- 1 particular key has a median of ~50 columns. In other words the entire
set is ~ 95% sparse.
- Wherever a key has a value for a particular column, that value is always
exactly a String (base 255) of 4KB length.
- the 4KB values themselves are pretty 'sparse' so would benefit a lot from
run-length encoding. Is this supported out of the box?

Given these properties how would Riak hold up? Hard to say of course, but
I'm looking for some general advice. 


View this message in context: http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
Sent from the Riak Users mailing list archive at Nabble.com.

More information about the riak-users mailing list