High number of Riak buckets

Vikram Lalit vikramlalit at gmail.com
Fri Sep 30 15:34:42 EDT 2016


Hiya Alexander,

Thanks much indeed for the detailed note... very interesting insights...

As you deduced, I actually omitted some pieces from my email for the sake
of simplicity. I'm actually leveraging a transient / stateless chat server
(ejabberd) wherein messages get delivered on live sessions / streams
without the client having to do look-ups. So the storage in Riak is
actually a post-facto delivery / archival rather than prior to the client
receiving them. Hence determining the time key for the look-up isn't going
to be an issue unless I run some analytics where I query all keys (which
would be an issue as I now understand from your comments).

There is of course the question of offline messages whose delivery would
depend on look-ups, but ejabberd there uses the username (the offline
storage is with the secondary index as well on leveldb) and hence the
timestamp not being important. Riak TS sure looks promising there but I'll
check further whether the change would be justified for only offline
messages, or in case other use cases crop up...

Makes sense on the listing all keys in a bucket being expensive though -
let me see how I can model my data for that!!!

Thanks again for your inputs... very informative...

Cheers.
Vikram


On Fri, Sep 30, 2016 at 12:23 PM, Alexander Sicular <siculars at gmail.com>
wrote:

> Hi Vikram,
>
> Bucket maximums aside, why are you modeling in this fashion? How will you
> retrieve individual keys if you don't know the time stamp in advance? Do
> you have a lookup somewhere else? Doable as lookup keys or crdts or other
> systems. Are you relying on listing all keys in a bucket? Definitely don't
> do that.
>
> Yes, there is a better way. Use Riak TS. Create a table with a composite
> primary key of topic and time. You can then retrieve by topic equality and
> time range. You can then cache those results in deterministic keys as
> necessary.
>
> If you don't already know, Riak TS is basically (there are some notable
> differences) Riak KV plus the time series data model. Riak TS makes all
> sorts of time series oriented projects easier than modeling them against
> KV. Oh, and you can also leverage KV buckets alongside TS (resource
> limitations not withstanding.)
>
> Would love to hear more,
> Alexander
>
> @siculars
> http://siculars.posthaven.com
>
> Sent from my iRotaryPhone
>
> > On Sep 29, 2016, at 19:42, Vikram Lalit <vikramlalit at gmail.com> wrote:
> >
> > Hi - I am creating a messaging platform wherein am modeling each topic
> to serve as a separate bucket. That means there can potentially be millions
> of buckets, with each message from a user becoming a value on a distinct
> timestamp key.
> >
> > My question is there any downside to modeling my data in such a manner?
> Or can folks advise a better way of storing the same in Riak?
> >
> > Secondly, I would like to modify the default bucket properties (n_val) -
> I understand that such 'custom' buckets have a higher performance overhead
> due to the extra load on the gossip protocol. Is there a way the default
> n_val of newly created buckets be changed so that even if I have the above
> said high number of buckets, there is no performance degrade? Believe there
> was such a config allowed in app.config but not sure that file is leveraged
> any more after riak.conf was introduced.
> >
> > Thanks much.
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160930/b507bc80/attachment-0002.html>


More information about the riak-users mailing list