Inconsistent map/reduce results

Keith Dreibelbis kdreibel at gmail.com
Tue Mar 29 21:16:27 EDT 2011


Followup to this (somewhat old) thread...

I had resolved my problem by putting the vnode_cache_entries=0 thing in
app.config, doing what Grant said.  But sometime later it began failing
again.  I was getting misses of 25%-50% on records that should have been
found by map reduce but weren't.  At that point I tried Rohman's suggestion
of using a random seed, and that worked around the problem successfully.
 But this isn't a very satisfying fix.

So the vnode_cache_entries=0 thing doesn't really fix it after all?  Is
there something else to put in the config that would make this work
properly, without the random seed hack?  BTW since the original thread I
have upgraded from 0.13 to 0.14, and the bug is still there.


Keith


On Thu, Mar 10, 2011 at 6:56 PM, Antonio Rohman Fernandez <
rohman at mahalostudio.com> wrote:

> if you want to avoid caching ( without configuration ), you can put some
> random variable in your map or reduce or both... that does the trick for me
> as the query will be always different:
>
> $seed = randomStringHere;
>
> {"map":{"language":"javascript","source":"function(v,k,a) { seed='.$seed.';
> x=Riak.mapValuesJson(v)[0]; return [v.values[0].data]; }"}
>
> Rohman
>
> On Thu, 10 Mar 2011 17:47:49 -0800, Keith Dreibelbis <kdreibel at gmail.com>
> wrote:
>
> Thanks for the prompt response, Grant.  I made the configuration change you
> suggested, and it fixed my problem.
>  Some followup questions:
>  - is it possible to configure this dynamically on a per-bucket basis, or
> just per-server like it is now?
> - is this fixed in a newer version?
>
> On Thu, Mar 10, 2011 at 2:56 PM, Grant Schofield <grant at basho.com> wrote:
>
>> There are currently some bugs in the mapreduce caching system. The best
>> thing to do would be to disable the feature, on 0.13 you can do this by
>> editing or adding the vnode_cache_entries to the riak_kv section of your
>> app.config. The entry would look like:
>> {vnode_cache_entries, 0},
>>
>>  Grant Schofield
>> Developer Advocate
>> Basho Technologies
>>
>>   On Mar 10, 2011, at 4:16 PM, Keith Dreibelbis wrote:
>>
>>  Hi riak-users,
>> I'm trying to do a map/reduce query from java on a 0.13 server, and get
>> inconsistent results.  What I'm doing should be pretty simple.  I'm hoping
>> someone will notice an obvious error in here, or have some insight:
>>  This is an automated test.  I'm doing a simple query where I'm trying to
>> get the keys for records with a certain field value.  In SQL it would look
>> like "SELECT id FROM table WHERE age = '32'".  In java I'm invoking it like
>> this:
>>    MapReduceResponse r = riak.mapReduceOverBucket(getBucket())
>>         .map(JavascriptFunction.anon(func), true)
>>              .submit();
>>  where riak is a RiakClient, getBucket() returns the name of the bucket,
>> and func is a string that looks like:
>>  function(value, keyData, arg) {
>>        var data = Riak.mapValuesJson(value)[0];
>>        if(data.age == "32")
>>          return [value.key];
>>       else
>>          return [];
>>    }
>>  No reduce phase.  All entries in the example bucket are json and have an
>> age field.  This initially works correctly, it gets back the matching
>> records as expected.  It also works in curl.  It's an automated test, so
>> each time I run this, it is using a different bucket.  After about a dozen
>> queries, this starts to fail.  It returns an empty result, when it should
>> have found records.  It fails in curl at the same time.
>>  I initially suspected this might have something to do with doing map
>> reduce too soon after writing, and the write not being available on all
>> nodes.  However, I changed the bucket schema entries for w,r,rw,dw from
>> "quorum" to "all", and this still happens (is there another bucket setting I
>> missed?). In addition, I only have 3 nodes (I'm using the dev123 example),
>> and am running curl long enough afterwards.
>>  Here's the strange part that makes me suspicious.  If I make
>> insignificant changes to the query, for example change the double quotes to
>> single quotes, add whitespace or extra parentheses, etc, then it suddenly
>> works again.  It will work on an existing bucket, and on subsequent tests,
>> but again only about a dozen times before it starts failing again. Same
>> behavior in curl.  This makes me suspect that the server is doing some
>> incorrect caching around this js function, based on the function string.
>>  Any explanation about what's going on?
>>  Keith
>>  _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>     --
>
> [image: line][image: logo] <http://mahalostudio.com> *Antonio Rohman Fernandez*
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com *Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com>[image: line]
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110329/58d176ce/attachment.html>


More information about the riak-users mailing list