Getting all the Keys

Jeremiah Peschka jeremiah.peschka at gmail.com
Sat Jan 22 22:16:48 EST 2011


<ShamelessPlug>
If you ever want to think about putting indexes in Riak, I played a little
thought game and wrote it out on my blog:
http://facility9.com/2010/12/16/secondary-indexes-how-would-you-do-it
</ShamelessPlug>

Otherwise - reverse indexes/roll you own b-tree. As an aside, thanks for
asking your questions, it prompted me to think and go look at some code to
see if I could figure out the answers before someone else came up with them.
I failed, but it was fun. :)

Jeremiah Peschka


On Sat, Jan 22, 2011 at 9:46 PM, Thomas Burdick <tburdick at wrightwoodtech.com
> wrote:

> I mistakenly didn't send a reply to the whole list, but given what everyone
> is saying I think I "get it" now and the reasoning.
>
> Given all of that it seems pretty clear that if I wanted to do what I'm
> talking about purely in the context of riak using links might work or a
> bucket containing keys and values that represent a data structure like a
> list or btree might work. But either way I guess its up to me if I want to
> make a index/faster method of traversal of keys. Thats fine, I accept thats
> the cost of using a dynamo database for now :-)
>
> Thanks for all the insights and comments.
>
> Cheers,
> Tom Burdick
>
>
> On Sat, Jan 22, 2011 at 7:22 PM, Sean Cribbs <sean at basho.com> wrote:
>
>> On Jan 22, 2011, at 4:15 PM, Thomas Burdick wrote:
>>
>> > * Why is key listing so slow?
>>
>> It is slow because, even if the keys are in RAM, you have to scan roughly
>> all of the keys in the cluster to get a listing for a single bucket.  As a
>> certain person is fond of saying, "full table scan is full table scan".
>>  There are ways to improve this, but without single-arbiters of state (and
>> points of failure) it is very costly.
>>
>> > * What do people do in the context of purely using riak to do what I
>> want, have a big set of keys to iterate over?
>>
>> As others have said so eloquently, they don't, they use something else. Or
>> they try to minimize how frequently they do it.  Part of the current
>> revolution in data storage is about realizing that no one tool is going to
>> completely fit your needs, and that that's good and right.  Anyone who tells
>> you otherwise is selling you a bill of goods.
>>
>> To understand why listing keys is difficult, you have to understand Riak's
>> (and Dynamo's) original design motivations:
>>
>> * To be basically available at all times for reads and writes, which in
>> turn means to be tolerant of machine and network failures.
>> * To provide low-latency random access to large data sets. (Note I didn't
>> say an entire data set.)
>> * To scale linearly with minimal operational complexity.
>>
>> Everything has tradeoffs - these are the ones we chose with Riak. Now, we
>> (Basho) are actively trying to create ways to make discovering your data
>> easier (key-filters are one of them, as Justin mentioned we're discussing
>> counters and indices), but the majority of people who use Riak have ways of
>> discovering or knowing keys ahead of time.  If that's not your case, you
>> should look into other solutions; some good ones have been mentioned in this
>> thread.  That said, we hear your pain and are working hard to improve
>> usability while maintaining the properties discussed above.
>>
>> Cheers,
>>
>> Sean Cribbs <sean at basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110122/d58d41bb/attachment-0001.html>


More information about the riak-users mailing list