Getting all the Keys

Dmitri Zagidulin dzagidulin at basho.com
Thu Apr 25 13:08:05 EDT 2013


Hi Chuck,

So there is not currently support for listing keys by just issuing a GET to
/buckets/bucketname/.
Part of the reason for that is - there's many operations to be performed on
the bucket resource -- list keys, get bucket properties, etc. That's why
you have several URLs to specify what you want to do with the bucket --
/buckets/bucketname/keys, /buckets/bucketname/props, etc. (Very much in
keeping with the REST philosophy).

Your best bet, to list keys, is actually the "streaming list keys" method:
/buckets/bucketname/keys?keys=stream

What you definitely DON'T want to do in production, is use the regular list
keys call, /buckets/bucketname/keys?keys=true, which waits to basically
build up a giant JSON array of the keys in memory, on the coordinating
node, and then send the whole thing back in the reply. As a result, it
usually takes way too long, and often ends up eating up all of the memory
on a node (if you have a large enough number of keys) and erroring out.
The only time where the regular, non-streaming list keys operation is
useful is when you're developing on a toy dataset, and want to quickly list
the keys in a bucket via curl or in a browser (streaming list keys doesn't
work in a browser so well).

So, to recap:

1) Listing keys with non-trivial datasets (for use with logical backups,
etc) - use streaming list keys. (keys=stream)

2) Listing keys in a browser or by curl, while developing, with toy
datasets - use non-streaming keys (keys=true).

Dmitri



On Thu, Apr 25, 2013 at 12:38 PM, n6mac41717 <csh at stanfordalumni.org> wrote:

> I know it's been over two years since this post, and I'm wondering if the
> latest version of Riak has made improvements to list keys--I tried the
> query
> with "keys=true" and I didn't seem to have TSA/octomom-related wait times.
>
> I was originally hoping that I could get a list of keys via the RESTful API
> which led me to this thread.  In other words, a GET url/bucket/key will
> indeed return what I shoved into the bucket at that key, but I was hoping
> that a GET url/bucket (I guess to be truly RESTful, I should make the
> bucket
> plural) would return the keys.
>
> Thoughts?
>
> Thanks in advance, Chuck
>
>
> Alexander Sicular wrote
> > Hi Thomas,
> >
> > This is a topic that has come up many times. Lemme just hit a couple of
> > high notes in no particular order:
> >
> > - If you must do a list keys op on a bucket, you must must must use
> > "?keys=stream". True will block on the coordinating node until all nodes
> > return their keys. Stream will start sending keys as soon as the first
> > node returns.
> >
> > - "list keys" is one of the most expensive native operations you can
> > perform in Riak. Not only does it do a full key scan of all the keys in
> > your bucket, but all the keys in your cluster. It is obnoxiously
> expensive
> > and only more so as the number of keys in your cluster grows. There has
> > been discussions about changing this but everything comes with a cost
> > (more open file descriptors) and I do not believe a decision has been
> made
> > yet.
> >
> > -Riak is in no way a relational system. It is, in fact, about as opposite
> > as you can get. Incidentally, "select *" is generally not recommended in
> > the Kingdom of Relations and regarded as wasteful. You need a bit of a
> > mind shift from relational world to have success with nosql in general
> and
> > Riak in particular.
> >
> > -There are no native indices in Riak. By default Riak uses the bitcask
> > backend. Bitcask has many advantages but one disadvantage is that all
> keys
> > (key length + a bit of overhead) must fit in ram.
> >
> > -Do not use "?keys=true". Your computer will melt. And then your face.
> >
> > -As of Riak 0.14 your m/r can filter on key name. I would highly
> recommend
> > that your data architecture take this into account by using keys that
> have
> > meaningful names. This will allow you to not scan every key in your
> > cluster.
> >
> > -Buckets are analogous to relational tables but only just. In Riak, you
> > can think of a bucket as a namespace holder (it is used as part of the
> > default circular hash function) but primarily as a mechanism to
> > differentiate system settings from one group of keys to the next.
> >
> > -There is no penalty for unlimited buckets except for when their settings
> > deviate from the system defaults. By settings I mean things like hooks,
> > replication values and backends among others.
> >
> > -One should list keys by truth if one enjoys sitting in parking lots on
> > the freeway on a scorching summers day or perhaps waiting in a TSA line
> at
> > your nearest international point of embarkation surrounded by octomom
> > families all the while juggling between the grope or the pr0n slideshow.
> > If that is for you, use "?keys=true".
> >
> > -Virtually everything in Riak is transient. Meaning, for the most part
> > (not including the 60 seconds or so of m/r cache), there is no caching
> > going on in Riak outside of the operating system. Ie. your subsequent
> > queries will do more or less the same work as their predecessors. You
> need
> > to cache your own results if you want to reuse them... quickly.
> >
> >
> >
> > Oh, there's more but I'm pretty jelloed from last night. Welcome to the
> > fold, Thomas. Can I call you Tom?
> >
> > Cheers,
> > -Alexander Sicular
> >
> > @siculars
> >
> > On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
> >
> >> I've been playing around with riak lately as really my first usage of a
> >> distributed key/value store. I quite like many of the concepts and
> >> possibilities of Riak and what it may deliver, however I'm really stuck
> >> on an issue.
> >>
> >> Doing the equivalent of a select * from sometable in riak is seemingly
> >> slow. As a quick test I tried...
> >>
> >> http://localhost:8098/riak/mytable?keys=true
> >>
> >> Before even iterating over the keys this was unbearably slow already.
> >> This took almost half a second on my machine where mytable is completely
> >> empty!
> >>
> >> I'm a little baffled, I would assume that getting all the keys of a
> table
> >> is an incredibly common task?  How do I get all the keys of a table
> >> quickly? By quickly I mean a few milliseconds or less as I would expect
> >> of even a "slow" rdbms with an empty table, even some tables with 1000's
> >> of items can get all the primary keys of a sql table in a few
> >> milliseconds.
> >>
> >> Tom Burdick
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >>
>
> > riak-users at .basho
>
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
>
> > riak-users at .basho
>
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Getting-all-the-Keys-tp2308764p4027757.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130425/8232d80b/attachment.html>


More information about the riak-users mailing list