Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

Sean Cribbs sean at basho.com
Thu May 26 14:21:50 EDT 2011


Kyle, you bring up a good point that I feel strongly about -- most people use list-keys as a substitute for better solutions (also known as an anti-pattern).  In most cases what they really need is one of:

* Better key/schema design, so keys are at least guessable if not knowable.
* Secondary indexes, which can sometimes be built manually, but are also in the product roadmap.
* Full-text search.

None of these require list keys, but all of them require strong knowledge of your problem domain and creative thinking.

With all of this discussion it has been pointed out to me there are two issues at hand, possibly conflated as one:

* Which is the least surprise, caching the key list or the incurring the large cost of the operation? Or is it that it is apparently performant in development (small numbers of keys) but not in production (large numbers of keys)?
* How can we better discourage use of list-keys while still exposing to developers who can handle the performance hit (or enjoy holes in their feet)?

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 12:52 PM, Aphyr wrote:

> Agreed. In fact, jrecursive pointed out to me last week that vnode operations are synchronous. That means that when you call list-keys, not only is it going to take a long time (right now upwards of 5 minutes) to complete, but while each vnode is returning its list of keys *it blocks any other requests*.
> 
> While list-keys is an unfortunate necessity for some things, its use should be minimized if you're going to get to any appreciable (100M keys) scale. I don't even know how we're going to use it at all above a billion. Possibly by listing the keys periodically from bitcask directly, and maintaining an index ourselves.
> 
> --Kyle
> 
> On 05/26/2011 09:40 AM, Sean Cribbs wrote:
>> With recent commits (
>> https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
>> <https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971..da3ab71a19d194c65a7b>
>> ), it is cached until you either refresh it manually by passing :reload
>> => true or a block (for streaming key lists). This was the compromise
>> reached in that pull-request.
>> 
>> All of this caching discussion glosses over the fact that you *should
>> not list keys* in any real application. It really begs the question --
>> how often do you list keys in Redis, or memcached? I suspect that
>> generally you don't. This isn't a relational database. (Also, how often
>> do you actually do a full-table scan in MySQL? You don't if you're sane
>> -- you use an index, or even LIMIT + OFFSET.)
>> 
>> I'm tempted to remove Document::all and make Bucket#keys harder to
>> access, but the balance between discouraging bad behavior and exposing
>> available functionality is a hard one to strike. I don't want new
>> developers to immediately use list-keys and then be discouraged from
>> using Riak because it's slow; on the other hand, it /can be useful/ in
>> some circumstances. In those cases where it's useful, the developer
>> should probably be responsible enough to request the key list only once;
>> the caching behavior simply does this for them. I guess whether it
>> /should/ do this for them is the issue at hand.
>> 
>> All that said, I'm really torn on this issue, and the same problem
>> applies to full-bucket MapReduce. Caveat emptor.
>> 
>> Sean Cribbs <sean at basho.com <mailto:sean at basho.com>>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:
>> 
>>> How long is the key list cached like that, naturally?*
>>> 
>>> <http://www.loomlearning.com/>
>>> 	*/
>>> /*Jonathan Langevin*/
>>> Systems Administrator
>>> *Loom Inc.*
>>> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com
>>> <mailto:jlangevin at loomlearning.com> - www.loomlearning.com
>>> <http://www.loomlearning.com/> - Skype: intel352
>>> 
>>> /*
>>> 
>>> *
>>> 
>>> 
>>> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs <sean at basho.com
>>> <mailto:sean at basho.com>> wrote:
>>> 
>>>    Keith,
>>> 
>>>    There was a pull-request issue out for this on the Github project
>>>    (https://github.com/seancribbs/ripple/pull/168). For various
>>>    reasons, the list of keys is memoized in the Riak::Bucket
>>>    instance. Passing :reload => true to the #keys method will cause
>>>    it to refresh. I like to discourage list-keys, but with the
>>>    memoized list you don't shoot yourself in the foot as often.
>>> 
>>>    Sean Cribbs <sean at basho.com <mailto:sean at basho.com>>
>>>    Developer Advocate
>>>    Basho Technologies, Inc.
>>>    http://basho.com/
>>> 
>>>    On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>>> 
>>>    > All -
>>>    >
>>>    > I just started working with Riak, and am using the riak-client
>>>    Ruby gem.
>>>    >
>>>    > When I delete a key from a bucket, and try to fetch the value
>>>    associated with that key, I get a 404 error (which is reasonable).
>>>    However, it remains in the bucket's list of keys (i.e. the value
>>>    returned by bucket.keys(). Why is the key still reported to exist
>>>    in the bucket? Is bucket.keys cached, and therefore unaware of the
>>>    deletion? Here's a riak-client Ruby script and its output in irb
>>>    that illustrates this:
>>>    >
>>>    > ree-1.8.7-2010.02 :001 > require 'riak'
>>>    > => true
>>>    > ree-1.8.7-2010.02 :002 >
>>>    > ree-1.8.7-2010.02 :003 > client = Riak::Client.new
>>>    > => #<Riak::Client http://127.0.0.1:8098 <http://127.0.0.1:8098/>>
>>>    > ree-1.8.7-2010.02 :004 > bucket = client['links']
>>>    > => #<Riak::Bucket {links}>
>>>    > ree-1.8.7-2010.02 :005 > key = bucket.keys.first
>>>    > => "4000-17.xml"
>>>    > ree-1.8.7-2010.02 :006 > object = bucket[key]
>>>    > => #<Riak::RObject {links,4000-17.xml} [text/xml]:(6430 bytes)>
>>>    > ree-1.8.7-2010.02 :007 > object.delete
>>>    > => #<Riak::RObject {links,4000-17.xml} [text/xml]:(6430 bytes)>
>>>    > ree-1.8.7-2010.02 :008 > bucket.keys.first
>>>    > => "4000-17.xml"
>>>    > ree-1.8.7-2010.02 :009 > object = bucket[key]
>>>    > Riak::HTTPFailedRequest: Expected [200, 300] from Riak but
>>>    received 404. not found
>>>    >
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
>>>    `perform'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
>>>    `request'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
>>>    `reading_body'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in
>>>    `request'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in
>>>    `request'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in
>>>    `start'
>>>    > from
>>>    /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in
>>>    `request'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
>>>    `perform'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>>>    `tap'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>>>    `perform'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
>>>    `get'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend.rb:72:in
>>>    `fetch_object'
>>>    > from
>>>    /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/bucket.rb:101:in
>>>    `[]'
>>>    > from riak-delete-failure.rb:9
>>>    >
>>>    > Thanks,
>>>    > Keith
>>>    >
>>>    >
>>>    >
>>>    > _______________________________________________
>>>    > riak-users mailing list
>>>    > riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>>    >
>>>    http://lists.basho.com/mailman/listinfo/riak-users_listsbasho.com
>>>    <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>>> 
>>> 
>>>    _______________________________________________
>>>    riak-users mailing list
>>>    riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>>    http://lists.basho.com/mailman/listinfo/riak-users_listsbasho.com
>>>    <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>>> 
>>> 
>> 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list