Getting all the Keys

Alexander Sicular siculars at gmail.com
Sat Jan 22 15:18:45 EST 2011


I don't think it is a flaw at all. Rather I am of the opinion that  
riak was never meant to do the things we are all talking about in this  
thread.

When I need to do these things I specifically use redis because, as  
noted, it has tremendous support for specific data structures. When I  
need to enumerate keys or mutate counters I use redis and periodically  
dump those values to riak. I'll write a post or somethin about it.  
Also, use redis if you wanna do these things.

I'll drop a phat tangent and just mention that I watched @rk's talk at  
Qcon SF 2010 the other day and am kinda crushing on how they  
implemented distributed counters in cassandra (mainlined in 0.7.1 me  
thinks) which, imho, is so choice for a riak implementation it isn't  
even funny. It was like pow pow in da face and my face got melted.

@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Jan 22, 2011, at 14:45, Gary William Flake <gary at flake.org> wrote:

> This is a really big pain point for me as well and -- at the risk of  
> prematurely being overly critical of Riak's overall design -- I  
> think it points to a major flaw of Riak in its current state.
>
> Let me explain....
>
> Riak is bad at enumerating keys.  We know that. I am happy to manage  
> a list of keys myself.  Fine.  How do I do that in Riak?
>
> Well, the obvious solution is to have a special object that you  
> maintain that is a list of the keys that you need.  So, each time  
> you insert a new object, you effectively append a new key to the end  
> of a list, that is itself a value to a special index key.
>
> But what is an append in Riak?  The only way to implement a list  
> append is to:
>
> 1. read in the entire value of your list object.
> 2. append to this list at the application layer.
> 3. reinsert the new value back into the list.
>
> This is a horrible solution for at least three reasons.  First,  
> inserting N new keys and maintaining your own list is now O(N*N)  
> runtime complexity because each append has to do I/O proportional to  
> the size of the entire list for each append.  Second, this operation  
> should be happening entirely at the data layer and not between the  
> data and app layer.  Third, it introduces write contentions in that  
> two clients may try to append at approximately the same time, giving  
> you a list that is now inconsistent.
>
> The conclusion for me is that you can't efficiently enumerate keys  
> with Riak even if you roll your own key index with Riak (in anything  
> close to an ideal way).
>
> To overcome this problem, Riak desperately needs to either maintain  
> its own key index efficiently, or it needs to support atomic  
> mutations on values.
>
> For an example of the latter approach, see Redis which I think  
> handles this beautifully.
>
> In the end, you may need to think about redesigning your data model  
> so that there never is a need to enumerate keys.  I am trying this  
> and I use a combination of:
>
> 1. Standard KV approaches,
> 2. Riak search for being able to enumerate some records in order,
> 3. Transactions logs stored in a special bucket,
> 4. Batched M/R phases on the Transaction logs to avoid write  
> contention, and
> 5. Batched rebuilding of "views" in a view bucket.
>
> Given that Riak search is loudly proclaimed as being beta, this  
> makes me fairly anxious.
>
> I am very close to not needing to enumerate keys the bad way now.   
> However, I would have killed for an atomic mutator like Redis.
>
> BTW, I would love for someone from Basho to disabuse me of my  
> conclusions in this note.
>
> -- GWF
>
>
>
>
>
>
>
> On Sat, Jan 22, 2011 at 10:40 AM, Alexander Staubo <lists at purefiction.net 
> > wrote:
> On Sat, Jan 22, 2011 at 19:34, Alexander Staubo  
> <lists at purefiction.net> wrote:
> > On Sat, Jan 22, 2011 at 18:23, Thomas Burdick
> > <tburdick at wrightwoodtech.com> wrote:
> >> So really whats the solution to just having a list of like 50k  
> keys that can
> >> quickly be appended to without taking seconds to then retrieve  
> later on. Or
> >> is this just not a valid use case for riak at all? That would  
> suck cause
> >> again, I really like the notion of an AP oriented database!
> >
> > I have been struggling with the same issue. You may want to look at
> > Cassandra, which handles sequential key range traversal very well.
> > Riak also has a problem with buckets sharing the same data storage
> > (buckets are essentially just a way to namespace keys), so if you  
> have
> > two buckets and fill up one of them, then enumerating the keys of  
> the
> > empty bucket will take a long time even though it
>
> I accidentally "Send". Again: I have been struggling with the same
> issue. You may want to look at Cassandra, which handles sequential key
> range traversal very well. Riak also has a problem with buckets
> sharing the same data storage (buckets are essentially just a way to
> namespace keys), so if you have two buckets and fill up one of them,
> then enumerating the keys of the empty bucket will take a long time
> even though it's empty. Cassandra does not have a problem with this,
> since Cassandra's keyspaces are separate data structures. I like Riak,
> but it only works well with single-key/linked traversal, not this kind
> of bucket-wide processing.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110122/419a1346/attachment-0001.html>


More information about the riak-users mailing list