Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

gbrits gbrits at gmail.com
Wed Jul 17 07:38:07 EDT 2013


Somewhere (can't find it now) I've read that Riak, like Cassandra could be
classified as a column store. 

This is just a name of course but what I understand from Cassandra is that
this allows for space-efficient encoding of column-values. Basically storage
is surrounded around columns instead of rows, allowing for different
persistence strategies on a per-column, or column-family, basis. Moreover,
it would allow for zero storage overhead for non-existent column values.
I.e: basically allowing for efficient storage of sparse data-sets.

Does Riak have this property as well?

More specifically, I've got a datastructure on paper with the following
properties, when mapped to riak nomenclature:

- ~ 1.000.000 keys (will not grow)
- ~ 1.000 columns.  (may grow)
- 1 particular key has a median of ~50 columns. In other words the entire
set is ~ 95% sparse.
- Wherever a key has a value for a particular column, that value is always
exactly a String (base 255) of 4KB length.
- the 4KB values themselves are pretty 'sparse' so would benefit a lot from
run-length encoding. Is this supported out of the box?

Given these properties how would Riak hold up? Hard to say of course, but
I'm looking for some general advice. 

Thanks. 




--
View this message in context: http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
Sent from the Riak Users mailing list archive at Nabble.com.




More information about the riak-users mailing list