newbie questions: sorted keys, ADT primitives, and link manipulation

Jeremiah Peschka jeremiah.peschka at gmail.com
Sun Jan 16 09:49:06 EST 2011


Alex Sicular posted about this on his blog the other day:
http://siculars.posterous.com/paginating-with-riak Basically, he uses a
combination of Riak and Redis to manage pagination.

If you're absolutely opposed to using multiple data stores, Eric's approach
should work. Otherwise, I'd offload sorting and that sort of thing to Redis
and use Riak as your data store. In memory caching is a good thing :)

Jeremiah Peschka
Microsoft SQL Server MVP
MCITP: Database Developer, DBA


On Sat, Jan 15, 2011 at 7:23 PM, Gary William Flake <gary at flake.org> wrote:

> I am building a backend for a web service and Riak looks to be a strong fit
> for my needs at this point.  However, there is one really simple requirement
> that I can't figure out how to implement on Riak with any sort of
> efficiency.  To simplify the question, suppose that I want a twitter-like
> service that has billions of smallish documents that enter into the backend
> over a period of time.  Typical read pattern will be:
>
> 1. List the last N documents
> 2. Given a document, see the N that came before this one.
> 3. From a random point, see the N documents before this one.
>
> Hence, what I really want is to be able to access the documents in sorted
> order by timestamp.  I know that I can do all of this with a map/reduce job,
> but this solution literally requires the backend to scan each and every
> document which seems less than ideal.
>
> Is there a generally accepted "right" way to do this on Riak?  Or is there
> no known way to do this gracefully, and I need to be prepared to roll my own
> secondary indices?  Or, can this be handled with a Riak Search range query?
>
> ###
>
> The second topic is concerned with finding an elegant way to address the
> first question.  One way to solve the problem above is to use a bucket as a
> poor man's list.  At the application layer, when I insert a new document, I
> could create a new k:v pair in my "list" bucket with the key being an
> integer corresponding to this document's position in the corpus, and the
> value being the actual key in the document bucket.  With this scheme, I
> could satisfy the scenario above with the following observations:
>
> 1. Once I have the list key for a document, I could find the N before it in
> O(N) time which is an improvement over the map/reduce solution which is
> O(size of corpus) / O(cluster size).
>
> 2. This solution assumes atomicity that doesn't exist in practice (e.g.,
> the operations "new id <- number of docs", and "add a new doc with key new
> id" have to complete as an atomic operation if multiple clients are
> attempting an insert).
>
> 3. This solution also wants a multiget styled function to map index values
> to true document keys, and then to get the resulting document.
>
> I know what you are thinking: just use a serialized JSON array as the value
> of the list that you want, then just get it, deserialize it, mutate it, then
> update its value.  But this is horrific because each insertion takes
> O(corpus size) time.
>
> Bottom line: life sucks no matter which route we take.
>
> What I think is needed are redis styled objects that support O(1) atomic
> operations.  For example, if I simply had a k:v pair where the value was a
> list which supported insert/delete front/rear, then we are golden.
>
> ###
>
> This brings me to my third question / observation.  In scanning over the
> archives of this list, I see that others have requested this feature before,
> but I haven't seen any mention of the feature being considered.  (Please,
> please!  I hope I am wrong).
>
> So, I want to offer what I think is the minimal feature request that
> supports the scenarios that I am trying to achieve.  Basically, I want to
> build off of links, allowing them to generalize into container types.  My
> assumptions:
>
> 1. Links have a natural order in that they alway come out in the same order
> that they are created (at least that's what the python client API states,
> but I can't confirm it in the Riak docs).
>
> 2. Links seem to support mutation operations, meaning that I can add a link
> and remove a link in O(1) time (and not in time proportional to the number
> of links on the record).
>
> If both of these are true, then all that is needed is to add simple O(1)
> mutation operations allowing one to treat a records links as a special
> container that references to other records.  Insert/delete front/rear will
> give you stacks and queues.  A small addition to the API would support sets.
>
> ###
>
> That's all for now.  I hope someone can chime in with something to add on
> top.  My real hope is that I've misunderstood the capabilities of Riak and
> that what I want to do can be done today.  But if that's not the case, I am
> curious if others have the same needs and if they know how to work around
> them.
>
> Thanks,
> -- GWF
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110116/56acd84c/attachment.html>


More information about the riak-users mailing list