Most efficient way to determine if 1000 specific keys exist?

Markus Silpala msilpala at gmail.com
Sat May 5 13:32:54 EDT 2012


Hey Tim,

Riak does not maintain a key index in the sense that I think you're looking for, and scanning across all keys in a bucket is consequently a fairly expensive operation (see List Keys for more info on that). A map-reduce query which takes the entire bucket as input also begins with a key-scan op under the covers. Since you already know the set of keys you're looking for, you probably do a direct fetch of each key. This will likely perform a lot better and put less load on the system.

To avoid reading the objects, you might consider doing a GET for each key, and use the "If-Modified-Since" header to avoid loading the the actual object data. I haven't tried this, but if you provide the current time as argument to If-Modified-Since, then every object present should report as "Not Modified" and return a 304 without actually reading the object data. Thus, for objects not present you'll get a 404 and for objects present you'll get a 304.

Mind you, I have not actually tried this, and I'm sure there are edge cases where you may get a 200 and the full object returned. Test, measure, and verify before committing.

For details, check out the Fetch Object page: http://wiki.basho.com/HTTP-Fetch-Object.html

Both of the links above assume you're using the HTTP API. The PB API doesn't seem to have the same option, so this may or may not work for your case.

Good luck with it!

-Markus Silpala


On May 2, 2012, at 1:47 PM, Tim Haines wrote:

> Hey guys,
> 
> Still a relative newbie here.
> 
> I was hoping to be able to setup a MapReduce job that I could feed 1000 keys to, and have it tell me of the 1000, which keys exist in the bucket.  I was hoping this could use the key index (such a thing exists right?) without having to read the objects.
> 
> The methods I've tried for doing this fail when the first non-existing key is found though.
> 
> Is there a way to do this?
> 
> Or alternatively, is there a way to check for the presence of one key at a time without riak having to read the object?
> 
> Cheers,
> 
> Tim.
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120505/4ec68d65/attachment.html>


More information about the riak-users mailing list