Getting all the Keys
jeremiah.peschka at gmail.com
Sat Jan 22 22:16:48 EST 2011
If you ever want to think about putting indexes in Riak, I played a little
thought game and wrote it out on my blog:
Otherwise - reverse indexes/roll you own b-tree. As an aside, thanks for
asking your questions, it prompted me to think and go look at some code to
see if I could figure out the answers before someone else came up with them.
I failed, but it was fun. :)
On Sat, Jan 22, 2011 at 9:46 PM, Thomas Burdick <tburdick at wrightwoodtech.com
> I mistakenly didn't send a reply to the whole list, but given what everyone
> is saying I think I "get it" now and the reasoning.
> Given all of that it seems pretty clear that if I wanted to do what I'm
> talking about purely in the context of riak using links might work or a
> bucket containing keys and values that represent a data structure like a
> list or btree might work. But either way I guess its up to me if I want to
> make a index/faster method of traversal of keys. Thats fine, I accept thats
> the cost of using a dynamo database for now :-)
> Thanks for all the insights and comments.
> Tom Burdick
> On Sat, Jan 22, 2011 at 7:22 PM, Sean Cribbs <sean at basho.com> wrote:
>> On Jan 22, 2011, at 4:15 PM, Thomas Burdick wrote:
>> > * Why is key listing so slow?
>> It is slow because, even if the keys are in RAM, you have to scan roughly
>> all of the keys in the cluster to get a listing for a single bucket. As a
>> certain person is fond of saying, "full table scan is full table scan".
>> There are ways to improve this, but without single-arbiters of state (and
>> points of failure) it is very costly.
>> > * What do people do in the context of purely using riak to do what I
>> want, have a big set of keys to iterate over?
>> As others have said so eloquently, they don't, they use something else. Or
>> they try to minimize how frequently they do it. Part of the current
>> revolution in data storage is about realizing that no one tool is going to
>> completely fit your needs, and that that's good and right. Anyone who tells
>> you otherwise is selling you a bill of goods.
>> To understand why listing keys is difficult, you have to understand Riak's
>> (and Dynamo's) original design motivations:
>> * To be basically available at all times for reads and writes, which in
>> turn means to be tolerant of machine and network failures.
>> * To provide low-latency random access to large data sets. (Note I didn't
>> say an entire data set.)
>> * To scale linearly with minimal operational complexity.
>> Everything has tradeoffs - these are the ones we chose with Riak. Now, we
>> (Basho) are actively trying to create ways to make discovering your data
>> easier (key-filters are one of them, as Justin mentioned we're discussing
>> counters and indices), but the majority of people who use Riak have ways of
>> discovering or knowing keys ahead of time. If that's not your case, you
>> should look into other solutions; some good ones have been mentioned in this
>> thread. That said, we hear your pain and are working hard to improve
>> usability while maintaining the properties discussed above.
>> Sean Cribbs <sean at basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users