Can my pagination approach scale?

Anton theatilla at gmail.com
Tue Jan 22 09:46:13 EST 2013


You can check roughly how well your approach will work with
basho_bench. If you estimate roughly how big your pages will be, set
up an appropriate benchmark and run it against the cluster or a
staging setup so you can get an idea of what performance you should
expect.

I don't think there's anything fundamentally wrong with your approach.
In fact I'm working on a similar storage scheme and I'm fairly
comfortable with it. You can find examples of real-world applications
in http://docs.basho.com/riak/latest/cookbooks/use-cases/. The Yammer
presentation, linked here,
http://docs.basho.com/riak/latest/cookbooks/use-cases/user-events-timelines/
also has similar ideas, it's worth checking out.



On 22 January 2013 14:56, Bach Le <thebullno1 at gmail.com> wrote:
> Hi, I'm currently using Riak for my project. It works well for single
> documents, however I often need to present to users a stream of (loosely)
> time ordered documents, Riak's keys are unordered by nature so there's no
> straight forward way of traversing data. I came up with the following
> approach:
>
> Make a bucket (i.e: "pages"), set allow_mult to true. Inside this bucket
> store a number that points to the "current" page, this number is initialized
> to 0, I call this a cursor. For every "page" of data, create an object in
> the same bucket, e.g: first page is associated with the key page_0, second
> page: page_1 etc... These page objects are sets modeled using statebox for
> conflict resolution.
>
> When a document is inserted, read the cursor value. Since the cursor can
> only be increasing, we resolve conflicts by choosing the largest value among
> the siblings. Next, read the page that it points to (if cursor is 0, read
> the key "page_0", if it is 1, read "page_1" etc). If the number of objects
> inside this set exceeds the page size, increment the counter and create a
> new page to insert the object into, otherwise, leave the counter be and
> insert into this page.
>
> To retrieve data in reverse chronological order, read the cursor to find out
> the current page and then read the last page (which is shown to users as the
> first page).
>
> Currently, my document's ids are monotonically increasing using this:
> https://github.com/boundary/flake so I can sort documents within a page.
>
> I do realize that a page size can exceed its limit however, I don't know how
> badly it can be with respect to writing rate. All I need is some form of
> bulk get and chunking without resorting to 2i which can cover the whole
> cluster.
>
> So, is there any major problem with this approach? Thanks.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list