Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset
sean at basho.com
Wed Jul 17 08:53:57 EDT 2013
Just to add to Jeremiah's comments, I think you should consider whether you
will be mostly retrieving:
1) all 1000 columns
2) some subset of columns
3) single columns
That will greatly influence how you design your keyspace. Remember, with
Riak it's just key-value in the end. This is one of my favorite examples of
building a column-like system on top of pure key-value, Boundary's
"Kobayashi" system: https://vimeo.com/42902962
On Wed, Jul 17, 2013 at 7:25 AM, Jeremiah Peschka <
jeremiah.peschka at gmail.com> wrote:
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
> On Jul 17, 2013, at 4:38 AM, gbrits <gbrits at gmail.com> wrote:
> > Somewhere (can't find it now) I've read that Riak, like Cassandra could
> > classified as a column store.
> That is incorrect. Riak is a key value database where the value is an
> opaque blob.
> > This is just a name of course but what I understand from Cassandra is
> > this allows for space-efficient encoding of column-values. Basically
> > is surrounded around columns instead of rows, allowing for different
> > persistence strategies on a per-column, or column-family, basis.
> > it would allow for zero storage overhead for non-existent column values.
> > I.e: basically allowing for efficient storage of sparse data-sets.
> > Does Riak have this property as well?
> No. Riak will happily store whatever you throw at it. That being said,
> most good serialization libraries will leave off nullable properties.
> > More specifically, I've got a datastructure on paper with the following
> > properties, when mapped to riak nomenclature:
> > - ~ 1.000.000 keys (will not grow)
> > - ~ 1.000 columns. (may grow)
> > - 1 particular key has a median of ~50 columns. In other words the entire
> > set is ~ 95% sparse.
> > - Wherever a key has a value for a particular column, that value is
> > exactly a String (base 255) of 4KB length.
> > - the 4KB values themselves are pretty 'sparse' so would benefit a lot
> > run-length encoding. Is this supported out of the box?
> See above.
> > Given these properties how would Riak hold up? Hard to say of course, but
> > I'm looking for some general advice.
> Riak objects should be no more than ~10MB for performance reasons. You
> should be safe.
> > Thanks.
> > --
> > View this message in context:
> > Sent from the Riak Users mailing list archive at Nabble.com.
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> riak-users mailing list
> riak-users at lists.basho.com
Sean Cribbs <sean at basho.com>
Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users